Helicobacter polypeptides and corresponding polynucleotide molecules

ABSTRACT

The invention provides Helicobacter polypeptides that can be used in vaccination methods for preventing or treating Helicobacter infection, and polynucleotides that encode these polypeptides.

[0001] This is a continuation-in-part application of U.S. Ser. No. 08/749,051, filed on Nov. 14, 1996.

[0002] The invention relates to Helicobacter antigens and corresponding polynucleotide molecules that can be used in methods to prevent or treat Helicobacter infection in mammals, such as humans.

BACKGROUND OF THE INVENTION

[0003] Helicobacter is a genus of spiral, gram-negative bacteria that colonize the gastrointestinal tracts of mammals. Several species colonize the stomach, most notably H. pylori, H. heilmanii, H. felis, and H. mustelae. Although H. pylori is the species most commonly associated with human infection, H. heilmanii and H. felis have also been isolated from humans, but at lower frequencies than H. pylori. Helicobacter infects over 50% of adult populations in developed countries and nearly 100% in developing countries and some Pacific rim countries, making it one of the most prevalent infections worldwide.

[0004] Helicobacter is routinely recovered from gastric biopsies of humans with histological evidence of gastritis and peptic ulceration. Indeed, H. pylori is now recognized as an important pathogen of humans, in that the chronic gastritis it causes is a risk factor for the development of peptic ulcer diseases and gastric carcinoma. It is thus highly desirable to develop safe and effective vaccines for preventing and treating Helicobacter infection.

[0005] A number of Helicobacter antigens have been characterized or isolated. These include urease, which is composed of two structural subunits of approximately 30 and 67 kDa (Hu et al., Infect. Immun. 58:992, 1990; Dunn et al., J. Biol. Chem. 265:9464, 1990; Evans et al., Microbial Pathogenesis 10:15, 1991; Labigne et al., J. Bact., 173:1920, 1991); the 87 kDa vacuolar cytotoxin (VacA) (Cover et al., J. Biol. Chem. 267:10570, 1992; Phadnis et al., Infect. Immun. 62:1557, 1994; WO 93/18150); a 128 kDa immunodominant antigen associated with the cytotoxin (CagA, also called TagA; WO 93/18150; U.S. Pat. No. 5,403,924); 13 and 58 kDa heat shock proteins HspA and HspB (Suerbaum et al., Mol. Microbiol. 14:959, 1994; WO 93/18150); a 54 kDa catalase (Hazell et al., J.

[0006] Gen. Microbiol.137:57, 1991); a 15 kDa histidine-rich protein (Hpn) (Gilbert et al., Infect. Immun. 63:2682, 1995); a 20 kDa membrane-associated lipoprotein (Kostrcynska et al., J. Bact. 176:5938, 1994); a 30 kDa outer membrane protein (Bölin et al., J. Clin. Microbiol. 33:381, 1995); a lactoferrin receptor (FR 2,724,936); and several porins, designated HopA, HopB, HopC, HopD, and HopE, which have molecular weights of 48-67 kDa (Exner et al., Infect. Immun. 63:1567, 1995; Doig et al., J. Bact. 177:5447, 1995). Some of these proteins have been proposed as potential vaccine antigens. In particular, urease is believed to be a vaccine candidate (WO 94/9823; WO 95/22987; WO 95/3824; Michetti et al., Gastroenterology 107:1002, 1994). Nevertheless, it is thought that several antigens may ultimately be necessary in a vaccine.

SUMMARY OF THE INVENTION

[0007] The invention provides polynucleotide molecules that encode Helicobacter polypeptides, designated GHPO 1012, GHPO 1190, GHPO 1398, GHPO 1501, GHPO 1550, GBPO 1620, GHPO 276, GHPO 329, GHPO 470, GHPO 574, GHPO 689, and GHPO 706, which can be used, e.g., in methods to prevent, treat, or diagnose Helicobacter infection. The polypeptides include those having the amino acid sequences shown in SEQ ID NOs:2-24 (even numbers). Those skilled in the art will understand that the invention also includes polynucleotide molecules that encode mutants and derivatives of these polypeptides, which can result from the addition, deletion, or substitution of non-essential amino acids, as is described further below.

[0008] In addition to the polynucleotide molecules described above, the invention includes the corresponding polypeptides (i.e., polypeptides encoded by the polynucleotide molecules of the invention, or fragments thereof), and monospecific antibodies that specifically bind to these polypeptides.

[0009] The present invention has many applications and includes expression cassettes, vectors, and cells transformed or transfected with the polynucleotides of the invention. Accordingly, the present invention provides (i) methods for producing polypeptides of the invention in recombinant host systems and related expression cassettes, vectors, and transformed or transfected cells; (ii) live vaccine vectors, such as pox virus, Salmonella typhimurium, and Vibrio cholerae vectors, that contain polynucleotides of the invention (such vaccine vectors being useful in, e.g., methods for preventing or treating Helicobacter infection) in combination with a diluent or carrier, and related pharmaceutical compositions and associated therapeutic and/or prophylactic methods; (iii) therapeutic and/or prophylactic methods involving administration of polynucleotide molecules, either in a naked form or formulated with a delivery vehicle, polypeptides or mixtures of polypeptides, or monospecific antibodies of the invention, and related pharmaceutical compositions; (iv) methods for detecting the presence of Helicobacter in biological samples, which can involve the use of polynucleotide molecules, monospecific antibodies, or polypeptides of the invention; and (v) methods for purifying polypeptides of the invention by antibody-based affinity chromatography.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010]FIG. 1A is a diagrammatic representation of transposon TnMax9, which is a derivative of the TnMax transposon system (Haas et al., Gene 130:23-21, 1993). The mini-transposon carries the blaM gene, which is the β-lactamase gene lacking a promoter and a signal sequence, next to the inverted repeats (IR) and the M13 forward (M13-FP) and reverse (M13-RP1) primer binding sites. The resolution site (res) and an origin of replication (ori_(fd)) are located between the BlaM gene and the constitutive cat_(GC)-resistance gene. The transposase tnpA and resolvase tnpR genes are located outside of the mini-transposon and are under the control of the inducible P_(trc) promoter. The lacIq gene encodes the Lac repressor.

[0011]FIG. 1B is a diagrammatic representation of plasmid pMin2. pMin2 contains a multiple cloning site, the tetracycline resistance gene (tet), an origin of transfer (oriT), an origin of replication (ori_(ColE1)), a transcriptional terminator (t_(fd)), and a weak, constitutive promoter (P_(iga)) H. pylori chromosome fragments were introduced into the BglII and ClaI sites of pMin2.

DETAILED DESCRIPTION

[0012] Open reading frames (ORFs) encoding new, full length, membrane-associated polypeptides, designated GHPO 1012, GHPO 1190, GHPO 1398, GHPO 1501, GHPO 1550, GHPO 1620, GHPO 276, GHPO 329, GHPO 470, GHPO 574, GHPO 689, and GHPO 706, have been identified in the H. pylori genome. These polypeptides can be used, for example, in vaccination methods for preventing or treating Helicobacter infection. The new polypeptides are secreted polypeptides that can be produced in their mature forms (i.e., as polypeptides that have been exported through class II or class III secretion pathways) or as precursors that include signal peptides, which can be removed in the course of excretion/secretion by cleavage at the N-terminal end of the mature form. (The cleavage site is located at the C-terminal end of the signal peptide, adjacent to the mature form.) The cleavage sites for the new polypeptides and, thus, the first amino acids of the mature polypeptides, were putatively determined.

[0013] According to a first aspect of the invention, there are provided isolated polynucleotides that encode the precursor and mature forms of Helicobacter GHPO 1012, GHPO 1190, GHPO 1398, GHPO 1501, GHPO 1550, GHPO 1620, GHPO 276, GHPO 329, GHPO 470, GHPO 574, GHPO 689, and GHPO 706.

[0014] An isolated polynucleotide of the invention encodes:

[0015] (i) a polypeptide having an amino acid sequence that is homologous to a Helicobacter amino acid sequence of a polypeptide associated with the Helicobacter membrane, the Helicobacter amino acid sequence being selected from the group consisting of the amino acid sequences shown:

[0016] in SEQ ID NO:2, beginning with an amino acid in any one of positions −22 to 5, preferably in position −22 or position 1, and ending with an amino acid in position 525 (GHPO 1012);

[0017] in SEQ ID NO:4, beginning with an amino acid in any one of positions −25 to 5, preferably in position −25 or position 1, and ending with an amino acid in position 451 (GHPO 1190);

[0018] in SEQ ID NO:6, beginning with an amino acid in any one of positions −25 to 5, preferably in position −25 or position 1, and ending with an amino acid in position 225 (GHPO 1398);

[0019] in SEQ ID NO:8, beginning with an amino acid in any one of positions −19 to 5, preferably in position −19 or position 1, and ending with an amino acid in position 310 (GHPO 1501);

[0020] in SEQ ID NO: 10, beginning with an amino acid in any one of positions −40 to 5, preferably in position −40 or position 1, and ending with an amino acid in position 182 (GHPO 1550);

[0021] in SEQ ID NO: 12, beginning with an amino acid in any one of positions −20 to 5, preferably in position −20 or position 1, and ending with an amino acid in position 603 (GHPO 1620);

[0022] in SEQ ID NO:14, beginning with an amino acid in any one of positions −40 to 5, preferably in position −40 or position 1, and ending with an amino acid in position 232 (GHPO 276);

[0023] in SEQ ID NO: 16, beginning with an amino acid in any one of positions −37 to 5, preferably in position −37 or position 1, and ending with an amino acid in position 190 (GHPO 329);

[0024] in SEQ ID NO: 18, beginning with an amino acid in any one of positions −30 to 5, preferably in position −30 or position 1, and ending with an amino acid in position 989 (GHPO 470);

[0025] in SEQ ID NO:20, beginning with an amino acid in any one of positions −19 to 5, preferably in position −19 or position 1, and ending with an amino acid in position 295 (GHPO 574);

[0026] in SEQ ID NO:22, beginning with an amino acid in any one of positions −21 to 5, preferably in position −21 or position 1, and ending with an amino acid in position 642 (GHPO 689); and

[0027] in SEQ ID NO:24, beginning with an amino acid in any one of positions −23 to 5, preferably in position −23 or position 1, and ending with an amino acid in position 254 (GHPO 706); or

[0028] (ii) a derivative of the polypeptide.

[0029] The term “isolated polynucleotide” is defined as a polynucleotide that is removed from the environment in which it naturally occurs. For example, a naturally-occurring DNA molecule present in the genome of a living bacteria or as part of a gene bank is not isolated, but the same molecule, separated from the remaining part of the bacterial genome, as a result of, e.g., a cloning event (amplification), is “isolated.” Typically, an isolated DNA molecule is free from DNA regions (e.g., coding regions) with which it is immediately contiguous, at the 5′ or 3′ ends, in the naturally occurring genome. Such isolated polynucleotides can be part of a vector or a composition and still be isolated, as such a vector or composition is not part of its natural environment.

[0030] A polynucleotide of the invention can consist of RNA or DNA (e.g., cDNA, genomic DNA, or synthetic DNA), or modifications or combinations of RNA or DNA. The polynucleotide can be double-stranded or single-stranded and, if single-stranded, can be the coding (sense) strand or the non-coding (anti-sense) strand. The sequences that encode polypeptides of the invention, as shown in any of SEQ ID NOs:2-24 (even numbers), can be (a) the coding sequence as shown in any of SEQ ID NOs: 1-23 (odd numbers); (b) a ribonucleotide sequence derived by transcription of (a); or (c) a different coding sequence that, as a result of the redundancy or degeneracy of the genetic code, encodes the same polypeptides as the polynucleotide molecules having the sequences illustrated in any of SEQ ID NOs: 1-23 (odd numbers). The polypeptide can be one that is naturally secreted or excreted by, e.g., H. felis, H. mustelae, H. heilmanii, or H. pylori.

[0031] By “polypeptide” or “protein” is meant any chain of amino acids, regardless of length or post-translational modification (e.g., glycosylation or phosphorylation). Both terms are used interchangeably in the present application.

[0032] By “homologous amino acid sequence” is meant an amino acid sequence that differs from an amino acid sequence shown in any of SEQ ID NOs:2-24 (even numbers), or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NOs: 1-23 (odd numbers), by one or more non-conservative amino acid substitutions, deletions, or additions located at positions at which they do not destroy the specific antigenicity of the polypeptide. Preferably, such a sequence is at least 75%, more preferably at least 80%, and most preferably at least 90% identical to an amino acid sequence shown in any of SEQ ID NOs:2-24 (even numbers).

[0033] Homologous amino acid sequences include sequences that are identical or substantially identical to an amino acid sequence as shown in any of SEQ ID NOs:2-24 (even numbers). By “amino acid sequence that is substantially identical” is meant a sequence that is at least 90%, preferably at least 95%, more preferably at least 97%, and most preferably at least 99% identical to an amino acid sequence of reference and that differs from the sequence of reference, if at all, by a majority of conservative amino acid substitutions.

[0034] Conservative amino acid substitutions typically include substitutions among amino acids of the same class. These classes include, for example, amino acids having uncharged polar side chains, such as asparagine, glutamine, serine, threonine, and tyrosine; amino acids having basic side chains, such as lysine, arginine, and histidine; amino acids having acidic side chains, such as aspartic acid and glutamic acid; and amino acids having nonpolar side chains, such as glycine, alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan, and cysteine.

[0035] Homology can be measured using sequence analysis software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705). Similar amino acid sequences are aligned to obtain the maximum degree of homology (i.e., identity). To this end, it may be necessary to artificially introduce gaps into the sequence. Once the optimal alignment has been set up, the degree of homology (i.e., identity) is established by recording all of the positions in which the amino acids of both sequences are identical, relative to the total number of positions.

[0036] Homologous polynucleotide sequences are defined in a similar way. Preferably, a homologous sequence is one that is at least 45%, more preferably at least 60%, and most preferably at least 85% identical to a coding sequence any of SEQ ID NOs: 1-23 (odd numbers).

[0037] Polypeptides having a sequence homologous to any one of the sequences shown in SEQ ID NOs:2-24 (even numbers), include naturally-occurring allelic variants, as well as mutants or any other non-naturally occurring variants that are analogous in terms of antigenicity, to a polypeptide having a sequence as shown in any one of SEQ ID NOs:2-24 (even numbers).

[0038] As is known in the art, an allelic variant is an alternate form of a polypeptide that is characterized as having a substitution, deletion, or addition of one or more amino acids that does not alter the biological function of the polypeptide. By “biological function” is meant a function of the polypeptide in the cells in which it naturally occurs, even if the function is not necessary for the growth or survival of the cells. For example, the biological function of a porin is to allow the entry into cells of compounds present in the extracellular medium. The biological function is distinct from the antigenic function. A polypeptide can have more than one biological function.

[0039] Allelic variants are very common in nature. For example, a bacterial species, e.g., H. pylori, is usually represented by a variety of strains that differ from each other by minor allelic variations. Indeed, a polypeptide that fulfills the same biological function in different strains can have an amino acid sequence that is not identical in each of the strains. Such an allelic variation can be equally reflected at the polynucleotide level.

[0040] Support for the use of allelic variants of polypeptide antigens comes from, e.g., studies of the Helicobacter urease antigen. The amino acid sequence of Helicobacter urease varies widely from species to species, yet cross-species protection occurs, indicating that the urease molecule, when used as an immunogen, is highly tolerant of amino acid variations. Even among different strains of the single species H. pylori, there are amino acid sequence variations.

[0041] For example, although the amino acid sequences of the UreA and UreB subunits of H. pylori and H. felis ureases differ from one another by 26.5% and 11.8%, respectively (Ferrero et al., Molecular Microbiology 9(2):323-333, 1993), it has been shown that H. pylori urease protects mice from H. felis infection (Michetti et al., Gastroenterology 107:1002, 1994). In addition, it has been shown that the individual structural subunits of urease, UreA and UreB, which contain distinct amino acid sequences, are both protective antigens against Helicobacter infection (Michetti et al, supra). Similarly, Cuenca et al. (Gastroenterology 110: 1770, 1996) showed that therapeutic immunization of H. mustelae-infected ferrets with H. pylori urease was effective at eradicating H. mustelae infection. Further, several urease variants have been reported to be effective vaccine antigens, including, e.g., recombinant UreA+UreB apoenzyme expressed from pORV142 (UreA and UreB sequences derived from H. pylori strain CPM630; Lee et al., J. Infect. Dis.172:161, 1995); recombinant UreA+UreB apoenzyme expressed from pORV214 (UreA and UreB sequences differ from H. pylori strain CPM630 by one and two amino acid changes, respectively; Lee et al., supra, 1995); a UreA-glutathione-S-transferase fusion protein (UreA sequence from H. pylori strain ATCC 43504; Thomas et al., Acta Gastro-Enterologica Belgica 56:54, 1993); UreA+UreB holoenzyme purified from H. pylori strain NCTC11637 (Marchetti et al., Science 267:1655, 1995); a UreA-MBP fusion protein (UreA from H. pylori strain 85P; Ferrero et al., Infection and Immunity 62:4981, 1994); a UreB-MBP fusion protein (UreB from H. pylori strain 85P; Ferrero et al., supra); a UreA-MBP fusion protein (UreA from H. felis strain ATCC 49179; Ferrero et al., supra); a UreB-MBP fusion protein (UreB from H. felis strain ATCC 49179; Ferrero et al., supra); and a 37 kDa fragment of UreB containing amino acids 220-569 (Dore-Davin et al., “A 37 kD fragment of UreB is sufficient to confer protection against Helicobacter felis infection in mice”). Finally, Thomas et al. (supra) showed that oral immunization of mice with crude sonicates of H. pylori protected mice from subsequent challenge with H. felis.

[0042] Polynucleotides, e.g., DNA molecules, encoding allelic variants can easily be obtained by polymerase chain reaction (PCR) amplification of genomic bacterial DNA extracted by conventional methods. This involves the use of synthetic oligonucleotide primers matching sequences that are upstream and downstream of the 5′ and 3′ ends of the coding region. Suitable primers can be designed based on the nucleotide sequence information provided in any of SEQ ID NOs: 1-23 (odd numbers). Typically, a primer consists of 10 to 40, preferably 15 to 25 nucleotides. It can also be advantageous to select primers containing C and G nucleotides in proportions sufficient to ensure efficient hybridization, e.g., an amount of C and G nucleotides of at least 40%, preferably 50%, of the total nucleotide amount. Those skilled in the art can readily design primers that can be used to isolate the polynucleotides of the invention from different Helicobacter strains.

[0043] As an example, primers useful for cloning a DNA molecule encoding a polypeptide having the amino acid sequence of GHPO 1012 (SEQ ID NO:2), are shown in SEQ ID NO: (matching at the 5′ end) and in SEQ ID NO: (matching at the 3′ end). Experimental conditions for carrying out PCR can readily be determined by one skilled in the art and an illustration of carrying out PCR is provided in Example 3.

[0044] Thus, the first aspect of the invention includes:

[0045] (i) isolated polynucleotide (e.g., DNA molecules) molecules that can be amplified and/or cloned by the polymerase chain reaction from a Helicobacter, e.g., H. Pylori, genome using either:

[0046] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:25, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:27 (unprocessed GHPO 1012);

[0047] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:28, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:30 (unprocessed GHPO 1190);

[0048] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:31, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:33 (unprocessed GHPO 1398);

[0049] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:34, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:36 (unprocessed GHPO 1501);

[0050] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:37, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:39 (unprocessed GHPO 1550);

[0051] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:40, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:42 (unprocessed GHPO 1620);

[0052] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:43, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:45 (unprocessed GHPO 276);

[0053] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:46, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:48 (unprocessed GHPO 329);

[0054] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:49, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:51 (unprocessed GHPO 470);

[0055] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:52, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:54 (unprocessed GHPO 574);

[0056] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:55, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:57 (unprocessed GHPO 689); or

[0057] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:58, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:60 (unprocessed GHPO 706); and

[0058] (ii) isolated DNA molecules that can be amplified and/or cloned by the polymerase chain reaction from a Helicobacter, e.g., H. pylori, genome using either:

[0059] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:26, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:27 (mature GHPO 1012);

[0060] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:29, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:30 (mature GHPO 1190);

[0061] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:32, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:33 (mature GHPO 1398);

[0062] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:35, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:36 (mature GHPO 1501);

[0063] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:38, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:39 (mature GHPO 1550);

[0064] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:41, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:42 (mature GHPO 1620);

[0065] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:44, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:45(mature GHPO276);

[0066] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:47, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:48 (mature GHPO 329);

[0067] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:50, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:51 (mature GHPO 470);

[0068] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:53, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:54 (mature GHPO 574);

[0069] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:56, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:57 (mature GHPO 689); or

[0070] a 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:59, and a 3′ oligonucleotide primer having a sequence as shown in SEQ ID NO:60 (mature GHPO 706).

[0071] In the sequences of SEQ ID NOs:25-60, the letter “N” denotes a restriction endonuclease digestion site that contains, typically, 4 to 6 nucleotides. For example, the sequences 5′-GGATCC-3′ (BamHI) or 5′-CTCGAG-3′ (XhoI) can be used. Restriction sites can be selected by those skilled in the art so that the amplified DNA can be conveniently cloned into an appropriately digested vector, such as a plasmid.

[0072] Useful homologs that do not occur naturally can be designed using known methods for identifying regions of an antigen that are likely to be tolerant of amino acid sequence changes and/or deletions. For example, sequences of the antigen from different species can be compared to identify conserved sequences.

[0073] Polypeptide derivatives that are encoded by polynucleotides of the invention include, e.g., fragments, polypeptides having large internal deletions derived from full-length polypeptides, and fusion proteins. Polypeptide fragments of the invention can be derived from a polypeptide having a sequence homologous to any of the sequences of SEQ ID NOs:2-24 (even numbers), to the extent that the fragments retain the substantial antigenicity of the parent polypeptide (specific antigenicity). Polypeptide derivatives can also be constructed by large internal deletions that remove a substantial part of the parent polypeptide, while retaining specific antigenicity. Generally, polypeptide derivatives should be about at least 12 amino acids in length to maintain antigenicity. Advantageously, they can be at least 20 amino acids, preferably at least 50 amino acids, more preferably at least 75 amino acids, and most preferably at least 100 amino acids in length.

[0074] Useful polypeptide derivatives, e.g., polypeptide fragments, can be designed using computer-assisted analysis of amino acid sequences in order to identify sites in protein antigens having potential as surface-exposed, antigenic regions (Hughes et al., Infect. Immun. 60(9):3497, 1992). For example, the Laser Gene Program from DNA Star can be used to obtain hydrophilicity, antigenic index, and intensity index plots for the polypeptides of the invention. This program can also be used to obtain information about homologies of the polypeptides with known protein motifs. One skilled in the art can readily use the information provided in such plots to select peptide fragments for use as vaccine antigens. For example, fragments spanning regions of the plots in which the antigenic index is relatively high can be selected. One can also select fragments spanning regions in which both the antigenic index and the intensity plots are relatively high. Fragments containing conserved sequences, particularly hydrophilic conserved sequences, can also be selected.

[0075] Polypeptide fragments and polypeptides having large internal deletions can be used for revealing epitopes that are otherwise masked in the parent polypeptide and that may be of importance for inducing a protective T cell-dependent immune response. Deletions can also remove immunodominant regions of high variability among strains.

[0076] It is an accepted practice in the field of immunology to use fragments and variants of protein immunogens as vaccines, as all that is required to induce an immune response to a protein is a small (e.g., 8 to 10 amino acids) immunogenic region of the protein. This has been done for a number of vaccines against pathogens other than Helicobacter. For example, short synthetic peptides corresponding to surface-exposed antigens of pathogens such as murine mammary tumor virus (peptide containing 11 amino acids; Dion et al., Virology 179:474-477, 1990), Semliki Forest virus (peptide containing 16 amino acids; Snijders et al., J. Gen. Virol. 72:557-565, 1991), and canine parvovirus (2 overlapping peptides, each containing 15 amino acids; Langeveld et al., Vaccine 12(15):1473-1480, 1994) have been shown to be effective vaccine antigens against their respective pathogens.

[0077] Polynucleotides encoding polypeptide fragments and polypeptides having large internal deletions can be constructed using standard methods (see, e.g., Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons Inc., 1994), for example, by PCR, including inverse PCR, by restriction enzyme treatment of the cloned DNA molecules, or by the method of Kunkel et al. (Proc. Natl. Acad. Sci. USA 82:448, 1985; biological material available at Stratagene).

[0078] A polypeptide derivative can also be produced as a fusion polypeptide that contains a polypeptide or a polypeptide derivative of the invention fused, e.g., at the N- or C-terminal end, to any other polypeptide (hereinafter referred to as a peptide tail). Such a product can be easily obtained by translation of a genetic fusion, i.e., a hybrid gene. Vectors for expressing fusion polypeptides are commercially available, and include the pMal-c2 or pMal-p2 systems of New England Biolabs, in which the peptide tail is a maltose binding protein, the glutathione-S-transferase system of Pharmacia, or the His-Tag system available from Novagen. These and other expression systems provide convenient means for further purification of polypeptides and derivatives of the invention.

[0079] Another particular example of fusion polypeptides included in invention includes a polypeptide or polypeptide derivative of the invention fused to a polypeptide having adjuvant activity, such as, e.g., subunit B of either cholera toxin or E. coli heat-labile toxin. Several possibilities can be used for producing such fusion proteins. First, the polypeptide of the invention can be fused to the N-terminal end or, preferably, to the C-terminal end of the polypeptide having adjuvant activity. Second, a polypeptide fragment of the invention can be fused within the amino acid sequence of the polypeptide having adjuvant activity. Spacer sequences can also be included, if desired.

[0080] As stated above, the polynucleotides of the invention encode Helicobacter polypeptides in precursor or mature form. They can also encode hybrid precursors containing heterologous signal peptides, which can mature into polypeptides of the invention. By “heterologous signal peptide” is meant a signal peptide that is not found in the naturally-occurring precursor of a polypeptide of the invention.

[0081] A polynucleotide of the invention hybridizes, preferably under stringent conditions, to a polynucleotide having a sequence as shown in any of SEQ ID NOs: 1-23 (odd numbers). Hybridization procedures are, e.g., described by Ausubel et al. (supra); Silhavy et al. (Experiments with Gene Fusions, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1984); and Davis et al. (A Manual for Genetic Engineering: Advanced Bacterial Genetics, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1980). Important parameters that can be considered for optimizing hybridization conditions are reflected in the following formula, which facilitates calculation of the melting temperature (Tm), which is the temperature above which two complementary DNA strands separate from one another (Casey et al., Nucl. Acid Res. 4:1539, 1977): Tm=81.5+0.5×(% G+C)+1.6 log (positive ion concentration) −0.6× (% formamide). Under appropriate stringency conditions, hybridization temperature (Th) is approximately 20 to 40° C., 20 to 25° C., or, preferably, 30 to 40° C. below the calculated Tm. Those skilled in the art will understand that optimal temperature and salt conditions can be readily determined empirically in preliminary experiments using conventional procedures. For example, stringent conditions can be achieved, both for pre-hybridizing and hybridizing incubations, (i) within 4-16 hours at 42° C., in 6× SSC containing 50% formamide or (ii) within 4-16 hours at 65° C. in an aqueous 6× SSC solution (1 M NaCl, 0.1 M sodium citrate (pH 7.0)). For polynucleotides containing 30 to 600 nucleotides, the above formula is used and then is corrected by subtracting (600/polynucleotide size in base pairs). Stringency conditions are defined by a Th that is 5 to 10′ C. below Tm.

[0082] Hybridization conditions with oligonucleotides shorter than 20-30 bases do not precisely follow the rules set forth above. In such cases, the formula for calculating the Tm is as follows: Tm=4×(G+C)+2 (A+T). For example, an 18 nucleotide fragment of 50% G+C would have an approximate Tm of 54° C.

[0083] A polynucleotide molecule of the invention, containing RNA, DNA, or modifications or combinations thereof, can have various applications. For example, a polynucleotide molecule can be used (i) in a process for producing the encoded polypeptide in a recombinant host system, (ii) in the construction of vaccine vectors such as poxyiruses, which are further used in methods and compositions for preventing and/or treating Helicobacter infection, (iii) as a vaccine agent, in a naked form or formulated with a delivery vehicle and, (iv) in the construction of attenuated Helicobacter strains that can over-express a polynucleotide of the invention or express it in a non-toxic, mutated form.

[0084] According to a second aspect of the invention, there is therefore provided (i) an expression cassette containing a polynucleotide molecule of the invention placed under the control of elements (e.g., a promoter) required for expression; (ii) an expression vector containing an expression cassette of the invention; (iii) a procaryotic or eucaryotic cell transformed or transfected with an expression cassette and/or vector of the invention, as well as (iv) a process for producing a polypeptide or polypeptide derivative encoded by a polynucleotide of the invention, which involves culturing a procaryotic or eucaryotic cell transformed or transfected with an expression cassette and/or vector of the invention, under conditions that allow expression of the polynucleotide molecule of the invention and, recovering the encoded polypeptide or polypeptide derivative from the cell culture.

[0085] A recombinant expression system can be selected from procaryotic and eucaryotic hosts. Eucaryotic hosts include, for example, yeast cells (e.g., Saccharomyces cerevisiae or Pichia Pastoris), mammalian cells (e.g., COS1, NIH3T3, or JEG3 cells), arthropods cells (e.g., Spodoptera frugiperda (SF9) cells), and plant cells. Preferably, a procaryotic host such as E. coli is used. Bacterial and eucaryotic cells are available from a number of different sources that are known to those skilled in the art, e.g., the American Type Culture Collection (ATCC; Rockville, Md.).

[0086] The choice of the expression cassette will depend on the host system selected, as well as the features desired for the expressed polypeptide. For example, it may be useful to produce a polypeptide of the invention in a particular lipidated form or any other form. Typically, an expression cassette includes a constitutive or inducible promoter that is functional in the selected host system; a ribosome binding site; a start codon (ATG); if necessary, a region encoding a signal peptide, e.g., a lipidation signal peptide; a polynucleotide molecule of the invention; a stop codon; and, optionally, a 3′ terminal region (translation and/or transcription terminator). The signal peptide-encoding region is adjacent to the polynucleotide of the invention and is placed in the proper reading frame. The signal peptide-encoding region can be homologous or heterologous to the polynucleotide molecule encoding the mature polypeptide and it can be specific to the secretion apparatus of the host used for expression. The open reading frame constituted by the polynucleotide molecule of the invention, alone or together with the signal peptide, is placed under the control of the promoter so that transcription and translation occur in the host system. Promoters and signal peptide-encoding regions are widely known and available to those skilled in the art and include, for example, the promoter of Salmonella typhimurium (and derivatives) that is inducible by arabinose (promoter araB) and is functional in Gram-negative bacteria such as E. coli (U.S. Pat. No. 5,028,530; Cagnon et al., Protein Engineering 4(7):843, 1991); the promoter of the bacteriophage T7 RNA polymerase gene, which is functional in a number of E. coli strains expressing T7 polymerase (U.S. Pat. No. 4,952,496); the OspA lipidation signal peptide; and R1pB lipidation signal peptide (Takase et al., J. Bact. 169:5692, 1987).

[0087] The expression cassette is typically part of an expression vector, which is selected for its ability to replicate in the chosen expression system. Expression vectors (e.g., plasmids or viral vectors) can be chosen from, for example, those described in Pouwels et al. (Cloning Vectors: A Laboratory Manual, 1985, Supp. 1987) and can purchased from various commercial sources. Methods for transforming or transfecting host cells with expression vectors are well known in the art and will depend on the host system selected, as described in Ausubel et al. (supra).

[0088] Upon expression, a recombinant polypeptide of the invention (or a polypeptide derivative) is produced and remains in the intracellular compartment, is secreted/excreted in the extracellular medium or in the periplasmic space, or is embedded in the cellular membrane. The polypeptide can then be recovered in a substantially purified form from the cell extract or from the supernatant after centrifugation of the cell culture. Typically, the recombinant polypeptide can be purified by antibody-based affinity purification or by any other method known to a person skilled in the art, such as by genetic fusion to a small affinity-binding domain. Antibody-based affinity purification methods are also available for purifying a polypeptide of the invention extracted from a Helicobacter strain. Antibodies useful for immunoaffinity purification of the polypeptides of the invention can be obtained using methods described below.

[0089] Polynucleotides of the invention can also be used in DNA vaccination methods, using either a viral or bacterial host as gene delivery vehicle (live vaccine vector) or administering the gene in a free form, e.g., inserted into a plasmid. Therapeutic or prophylactic efficacy of a polynucleotide of the invention can be evaluated as is described below.

[0090] Accordingly, in a third aspect of the invention, there is provided (i) a vaccine vector such as a poxvirus, containing a polynucleotide molecule of the invention placed under the control of elements required for expression; (ii) a composition of matter containing a vaccine vector of the invention, together with a diluent or carrier; (iii) a pharmaceutical composition containing a therapeutically or prophylactically effective amount of a vaccine vector of the invention; (iv) a method for inducing an immune response against Helicobacter in a mammal (e.g., a human; alternatively, the method can be used in veterinary applications for treating or preventing Helicobacter infection of animals, e.g., cats or birds), which involves administering to the mammal an immunogenically effective amount of a vaccine vector of the invention to elicit an immune response, e.g., a protective or therapeutic immune response to Helicobacter; and (v) a method for preventing and/or treating a Helicobacter (e.g., H. pylori, H. felis, H. mustelae, or H. heilmanii) infection, which involves administering a prophylactic or therapeutic amount of a vaccine vector of the invention to an individual in need. Additionally, the third aspect of the invention encompasses the use of a vaccine vector of the invention in the preparation of a medicament for preventing and/or treating Helicobacter infection.

[0091] A vaccine vector of the invention can express one or several polypeptides or derivatives of the invention, as well as at least one additional Helicobacter antigen such as a urease apoenzyme or a subunit, fragment, homolog, mutant, or derivative thereof. In addition, it can express a cytokine, such as interleukin-2 (IL-2) or interleukin-12 (IL-12), that enhances the immune response. Thus, a vaccine vector can include an additional polynucleotide molecules encoding, e.g., urease subunit A, B, or both, or a cytokine, placed under the control of elements required for expression in a mammalian cell.

[0092] Alternatively, a composition of the invention can include several vaccine vectors, each of which being capable of expressing a polypeptide or derivative of the invention. A composition can also contain a vaccine vector capable of expressing an additional Helicobacter antigen such as urease apoenzyme, a subunit, fragment, homolog, mutant, or derivative thereof, or a cytokine such as IL-2 or IL-12.

[0093] In vaccination methods for treating or preventing infection in a mammal, a vaccine vector of the invention can be administered by any conventional route in use in the vaccine field, for example, to a mucosal (e.g., ocular, intranasal, oral, gastric, pulmonary, intestinal, rectal, vaginal, or urinary tract) surface or via a parenteral (e.g., subcutaneous, intradermal, intramuscular, intravenous, or intraperitoneal) route. Preferred routes depend upon the choice of the vaccine vector. The administration can be achieved in a single dose or repeated at intervals. The appropriate dosage depends on various parameters that are understood by those skilled in the art, such as the nature of the vaccine vector itself, the route of administration, and the condition of the mammal to be vaccinated (e.g., the weight, age, and general health of the mammal).

[0094] Live vaccine vectors that can be used in the invention include viral vectors, such as adenoviruses and poxyiruses, as well as bacterial vectors, e.g., Shigella, Salmonella, Vibrio cholerae, Lactobacillus, Bacille bilié de Calmette-Guérin (BCG), and Streptococcus. An example of an adenovirus vector, as well as a method for constructing an adenovirus vector capable of expressing a polynucleotide molecule of the invention, is described in U.S. Pat. No. 4,920,209. Poxyirus vectors that can be used in the invention include, e.g., vaccinia and canary pox viruses, which are described in U.S. Pat. No. 4,722,848 and U.S. Pat. No. 5,364,773, respectively (also see, e.g., Tartaglia et al., Virology 188:217, 1992, for a description of a vaccinia virus vector, and Taylor et al, Vaccine 13:539, 1995, for a description of a canary poxvirus vector). Poxyirus vectors capable of expressing a polynucleotide of the invention can be obtained by homologous recombination, as described in Kieny et al. (Nature 312:163, 1984) so that the polynucleotide of the invention is inserted in the viral genome under appropriate conditions for expression in mammalian cells. Generally, the dose of viral vector vaccine, for therapeutic or prophylactic use, can be from about 1×10⁴ to about 1×10¹¹, advantageously from about 1×10⁷ to about 1×10¹⁰, or, preferably, from about 1×10⁷ to about 1×10⁹ plaque-forming units per kilogram. Preferably, viral vectors are administered parenterally, for example, in 3 doses that are 4 weeks apart. Those skilled in the art will recognize that it is preferable to avoid adding a chemical adjuvant to a composition containing a viral vector of the invention and thereby minimizing the immune response to the viral vector itself.

[0095] Non-toxicogenic Vibrio cholerae mutant strains that can be used in live oral vaccines are described by Mekalanos et al. (Nature 306:551, 1983) and in U.S. Pat. No. 4,882,278 (strain in which a substantial amount of the coding sequence of each of the two ctxA alleles has been deleted so that no functional cholerae toxin is produced); WO 92/11354 (strain in which the irgA locus is inactivated by mutation; this mutation can be combined in a single strain with ctxA mutations); and WO 94/1533 (deletion mutant lacking functional ctxA and attRS1 DNA sequences). These strains can be genetically engineered to express heterologous antigens, as described in WO 94/19482. An effective vaccine dose of a V. cholerae strain capable of expressing a polypeptide or polypeptide derivative encoded by a polynucleotide molecule of the invention can contain, e.g., about 1×10⁵ to about 1×10⁹, preferably about 1×10⁶ to about 1×10⁸ viable bacteria in an appropriate volume for the selected route of administration. Preferred routes of administration include all mucosal routes, but, most preferably, these vectors are administered intranasally or orally.

[0096] Attenuated Salmonella typhimurium strains, genetically engineered for recombinant expression of heterologous antigens, and their use as oral vaccines, are described by Nakayama et al. (Bio/Technology 6:693, 1988) and in WO 92/11361. Preferred routes of administration for these vectors include all mucosal routes. Most preferably, the vectors are administered intranasally or orally.

[0097] Others bacterial strains useful as vaccine vectors are described by High et al. (EMBO 11: 1991, 1992) and Sizemore et al. (Science 270:299, 1995; Shigella flexneri); Medaglini et al. (Proc. Natl. Acad. Sci. USA 92:6868, 1995; ( Streptococcus gordonii); Flynn (Cell. Mol. Biol. 40 (suppl. I):31, 1194), and in WO 88/6626, WO 90/0594, WO 91/13157, WO 92/1796, and WO 92/21376 (Bacille Calmette Guerin). In bacterial vectors, a polynucleotide of the invention can be inserted into the bacterial genome or it can remain in a free state, for example, carried on a plasmid.

[0098] An adjuvant can also be added to a composition containing a bacterial vector vaccine. A number of adjuvants that can be used are known to those skilled in the art. For example, preferred adjuvants can be selected from the list provided below.

[0099] According to a fourth aspect of the invention, there is also provided (i) a composition of matter containing a polynucleotide of the invention, together with a diluent or carrier; (ii) a pharmaceutical composition containing a therapeutically or prophylactically effective amount of a polynucleotide of the invention; (iii) a method for inducing an immune response against Helicobacter, in a mammal, by administering to the mammal an immunogenically effective amount of a polynucleotide of the invention to elicit an immune response, e.g., a protective immune response to Helicobacter; and (iv) a method for preventing and/or treating a Helicobacter (e.g., H. pylori, H. felis, H. mustelae, or H. heilmanii) infection, by administering a prophylactic or therapeutic amount of a polynucleotide of the invention to an individual in need of such treatment. Additionally, the fourth aspect of the invention encompasses the use of a polynucleotide of the invention in the preparation of a medicament for preventing and/or treating Helicobacter infection. The fourth aspect of the invention preferably includes the use of a polynucleotide molecule placed under conditions for expression in a mammalian cell, e.g., in a plasmid that is unable to replicate in mammalian cells and to substantially integrate into a mammalian genome.

[0100] Polynucleotides (for example, DNA or RNA molecules) of the invention can also be administered as such to a mammal as a vaccine. When a DNA molecule of the invention is used, it can be in the form of a plasmid that is unable to replicate in a mammalian cell and unable to integrate into the mammalian genome. Typically, a DNA molecule is placed under the control of a promoter suitable for expression in a mammalian cell. The promoter can function ubiquitously or tissue-specifically. Examples of non-tissue specific promoters include the early Cytomegalovirus (CMV) promoter (U.S. Pat. No. 4,168,062) and the Rous Sarcoma Virus promoter (Norton et al., Molec. Cell Biol. 5:281, 1985). The desmin promoter (Li et al., Gene 78:243, 1989; Li et al., J. Biol. Chem. 266:6562, 1991; Li et al., J. Biol. Chem. 268:10403, 1993) is tissue-specific and drives expression in muscle cells. More generally, useful promoters and vectors are described, e.g., in WO 94/21797 and by Hartikka et al. (Human Gene Therapy 7:1205, 1996).

[0101] For DNA/RNA vaccination, the polynucleotide of the invention can encode a precursor or a mature form of a polypeptide of the invention. When it encodes a precursor form, the precursor sequence can be homologous or heterologous. In the latter case, a eucaryotic leader sequence can be used, such as the leader sequence of the tissue-type plasminogen factor (tPA).

[0102] A composition of the invention can contain one or several polynucleotides of the invention. It can also contain at least one additional polynucleotide encoding another Helicobaecter antigen, such as urease subunit A, B, or both, or a fragment, derivative, mutant, or analog thereof. A polynucleotide encoding a cytokine, such as interleukin-2 (IL-2) or interleukin-12 (IL-12), can also be added to the composition so that the immune response is enhanced. These additional polynucleotides are placed under appropriate control for expression. Advantageously, DNA molecules of the invention and/or additional DNA molecules to be included in the same composition are carried in the same plasmid.

[0103] Standard methods can be used in the preparation of therapeutic polynucleotides of the invention. For example, a polynucleotide can be used in a naked form, free of any delivery vehicles, such as anionic liposomes, cationic lipids, microparticles, e.g., gold microparticles, precipitating agents, e.g., calcium phosphate, or any other transfection-facilitating agent. In this case, the polynucleotide can be simply diluted in a physiologically acceptable solution, such as sterile saline or sterile buffered saline, with or without a carrier. When present, the carrier preferably is isotonic, hypotonic, or weakly hypertonic, and has a relatively low ionic strength, such as provided by a sucrose solution, e.g., a solution containing 20% sucrose.

[0104] Alternatively, a polynucleotide can be associated with agents that assist in cellular uptake. It can be, e.g., (i) complemented with a chemical agent that modifies cellular permeability, such as bupivacaine (see, e.g., WO 94/16737), (ii) encapsulated into liposomes, or (iii) associated with cationic lipids or silica, gold, or tungsten microparticles.

[0105] Anionic and neutral liposomes are well-known in the art (see, e.g., Liposomes: A Practical Approach, RPC New Ed, IRL Press, 1990, for a detailed description of methods for making liposomes) and are useful for delivering a large range of products, including polynucleotides.

[0106] Cationic lipids can also be used for gene delivery. Such lipids include, for example, Lipofectin™, which is also known as DOTMA (N-[1-(2,3-dioleyloxy)propyl]-N,N,N-trimethylammonium chloride), DOTAP (1,2-bis(oleyloxy)-3-(trimethylammonio)propane), DDAB (dimethyldioctadecylammonium bromide), DOGS (dioctadecylamidologlycyl spermine), and cholesterol derivatives. A description of these cationic lipids can be found in EP 187,702, WO 90/11092, U.S. Pat. No. 5,283,185, WO 91/15501, WO 95/26356, and U.S. Pat. No. 5,527,928. Cationic lipids for gene delivery are preferably used in association with a neutral lipid such as DOPE (dioleyl phosphatidylethanolamine; WO 90/11092). Other transfection-facilitating compounds can be added to a formulation containing cationic liposomes. A number of them are described in, e.g., WO 93/18759, WO 93/19768, WO 94/25608, and WO 95/2397. They include, e.g., spermine derivatives useful for facilitating the transport of DNA through the nuclear membrane (see, for example, WO 93/18759) and membrane-permeabilizing compounds such as GALA, Gramicidine S, and cationic bile salts (see, for example, WO 93/19768).

[0107] Gold or tungsten microparticles can also be used for gene delivery, as described in WO 91/359, WO 93/17706, and by Tang et al. (Nature 356:152, 1992). In this case, the microparticle-coated polynucleotides can be injected via intradermal or intraepidermal routes using a needleless injection device (“gene gun”), such as those described in U.S. Pat. No. 4,945,050, U.S. Pat. No. 5,015,580, and WO 94/24263.

[0108] The amount of DNA to be used in a vaccine recipient depends, e.g., on the strength of the promoter used in the DNA construct, the immunogenicity of the expressed gene product, the condition of the mammal intended for administration (e.g., the weight, age, and general health of the mammal), the mode of administration, and the type of formulation. In general, a therapeutically or prophylactically effective dose from about 1 μg to about 1 mg, preferably, from about 10 μg to about 800 μg, and, more preferably, from about 25 μg to about 250 μg, can be administered to human adults. The administration can be achieved in a single dose or repeated at intervals.

[0109] The route of administration can be any conventional route used in the vaccine field. As general guidance, a polynucleotide of the invention can be administered via a mucosal surface, e.g., an ocular, intranasal, pulmonary, oral, intestinal, rectal, vaginal, or urinary tract surface, or via a parenteral route, e.g., by an intravenous, subcutaneous, intraperitoneal, intradermal, intraepidermal, or intramuscular route. The choice of administration route will depend on, e.g., the formulation that is selected. A polynucleotide formulated in association with bupivacaine is advantageously administered into muscle. When a neutral or anionic liposome or a cationic lipid, such as DOTMA, is used, the formulation can be advantageously injected via intravenous, intranasal (for example, by aerosolization), intramuscular, intradermal, and subcutaneous routes. A polynucleotide in a naked form can advantageously be administered via the intramuscular, intradermal, or subcutaneous routes. Although not absolutely required, such a composition can also contain an adjuvant. A systemic adjuvant that does not require concomitant administration in order to exhibit an adjuvant effect is preferable.

[0110] The sequence information provided in the present application enables the design of specific nucleotide probes and primers that can be used in diagnostic methods. Accordingly, in a fifth aspect of the invention, there is provided a nucleotide probe or primer having a sequence found in or derived by degeneracy of the genetic code from a sequence shown in SEQ ID NOs:1-23 (odd numbers).

[0111] The term “probe” as used in the present application refers to DNA (preferably single stranded) or RNA molecules (or modifications or combinations thereof) that hybridize under the stringent conditions, as defined above, to polynucleotide molecules having sequences homologous to any of those shown in SEQ ID NOs: 1-23 (odd numbers), or to a complementary or anti-sense sequence or SEQ ID NOs: 1-23 (odd numbers). Generally, probes are significantly shorter than the full-length sequences shown in SEQ ID NOs: 1-23 (odd numbers). For example, they can contain from about 5 to about 100, preferably from about 10 to about 80 nucleotides. In particular, probes have sequences that are at least 75%, preferably at least 85%, more preferably 95% homologous to a portion of a sequence as shown in SEQ ID NOs: 1-23 (odd numbers) or a sequence complementary to such sequences.

[0112] Probes can contain modified bases, such as inosine, methyl-5-deoxycytidine, deoxyuridine, dimethylamino-5-deoxyuridine, or diamino-2, 6-purine. Sugar or phosphate residues can also be modified or substituted. For example, a deoxyribose residue can be replaced by a polyamide (Nielsen et al., Science 254:1497, 1991) and phosphate residues can be replaced by ester groups such as diphosphate, alkyl, arylphosphonate, and phosphorothioate esters. In addition, the 2′-hydroxyl group on ribonucleotides can be modified by addition of, e.g., alkyl groups.

[0113] Probes of the invention can be used in diagnostic tests, or as capture or detection probes. Such capture probes can be immobilized on solid supports, directly or indirectly, by covalent means or by passive adsorption. A detection probe can be labeled by a detectable label, for example a label selected from radioactive isotopes; enzymes, such as peroxidase and alkaline phosphatase; enzymes that are able to hydrolyze a chromogenic, fluorogenic, or luminescent substrate; compounds that are chromogenic, fluorogenic, or luminescent; nucleotide base analogs; and biotin.

[0114] Probes of the invention can be used in any conventional hybridization method, such as in dot blot methods (Maniatis et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1982), Southern blot methods (Southern, J. Mol. Biol. 98:503, 1975), northern blot methods (identical to Southern blot to the exception that RNA is used as a target), or a sandwich method (Dunn et al., Cell 12:23, 1977). As is known in the art, the latter technique involves the use of a specific capture probe and a specific detection probe that have nucleotide sequences that are at least partially different from each other.

[0115] Primers used in the invention usually contain about 10 to 40 nucleotides and are used to initiate enzymatic polymerization of DNA in an amplification process (e.g., PCR), an elongation process, or a reverse transcription method. In a diagnostic method involving PCR, the primers can be labeled.

[0116] Thus, the invention also encompasses (i) a reagent containing a probe of the invention for detecting and/or identifying the presence of Helicobacter in a biological material; (ii) a method for detecting and/or identifying the presence of Helicobacter in a biological material, in which (a) a sample is recovered or derived from the biological material, (b) DNA or RNA is extracted from the material and denatured, and (c) the sample is exposed to a probe of the invention, for example, a capture probe, a detection probe, or both, under stringent hybridization conditions, so that hybridization is detected; and (iii) a method for detecting and/or identifying the presence of Helicobacter in a biological material, in which (a) a sample is recovered or derived from the biological material, (b) DNA is extracted therefrom, (c) the extracted DNA is contacted with at least one, or, preferably two, primers of the invention, and amplified by the polymerase chain reaction, and (d) an amplified DNA molecule is produced.

[0117] As mentioned above, polypeptides that can be produced by expression of the polynucleotides of the invention can be used as vaccine antigens. Accordingly, a sixth aspect of the invention features a substantially purified polypeptide or polypeptide derivative having an amino acid sequence encoded by a polynucleotide of the invention.

[0118] A “substantially purified polypeptide” is defined as a polypeptide that is separated from the environment in which it naturally occurs and/or a polypeptide that is free of most of the other polypeptides that are present in the environment in which it was synthesized. The polypeptides of the invention can be purified from a natural source, such as a Helicobacter strain, or can be produced using recombinant methods.

[0119] Homologous polypeptides or polypeptide derivatives encoded by polynucleotides of the invention can be screened for specific antigenicity by testing cross-reactivity with an antiserum raised against a polypeptide having an amino acid sequence as shown in any of SEQ ID NOs:2-24 (even numbers). Briefly, a monospecific hyperimmune antiserum can be raised against a purified reference polypeptide as such or as a fusion polypeptide, for example, an expression product of MBP, GST, or His-tag systems, or a synthetic peptide predicted to be antigenic. The homologous polypeptide or derivative that is screened for specific antigenicity can be produced as such or as a fusion polypeptide. In the latter case, and if the antiserum is also raised against a fusion polypeptide, two different fusion systems are employed. Specific antigenicity can be determined using a number of methods, including Western blot (Towbin et al., Proc. Natl. Acad. Sci. USA 76:4350, 1979), dot blot, and ELISA methods, as described below.

[0120] In a Western blot assay, the product to be screened, either as a purified preparation or a total E. coli extract, is fractionated by SDS-PAGE, as described, for example, by Laemmli (Nature 227:680, 1970). After being transferred to a filter, such as a nitrocellulose membrane, the material is incubated with the monospecific hyperimmune antiserum, which is diluted in a range of dilutions from about 1:50 to about 1:5000, preferably from about 1:100 to about 1:500. Specific antigenicity is shown once a band corresponding to the product exhibits reactivity at any of the dilutions in the range.

[0121] In an ELISA assay, the product to be screened can be used as the coating antigen. A purified preparation is preferred, but a whole cell extract can also be used. Briefly, about 100 μL of a preparation of about 10 μg protein/mL is distributed into wells of a 96-well ELISA plate. The plate is incubated for about 2 hours at 37° C., then overnight at 4° C. The plate is washed with phosphate buffer saline (PBS) containing 0.05% Tween 20 (PBS/Tween buffer) and the wells are saturated with 250 μL PBS containing 1% bovine serum albumin (BSA), to prevent non-specific antibody binding. After 1 hour of incubation at 37° C., the plate is washed with PBS/Tween buffer. The antiserum is serially diluted in PBS/Tween buffer containing 0.5% BSA, and 100 μL dilutions are added to each well. The plate is incubated for 90 minutes at 37° C., washed, and evaluated using standard methods. For example, a goat anti-rabbit peroxidase conjugate can be added to the wells when the specific antibodies used were raised in rabbits. Incubation is carried out for about 90 minutes at 37° C. and the plate is washed. The reaction is developed with the appropriate substrate and the reaction is measured by colorimetry (absorbance measured spectrophotometrically). Under these experimental conditions, a positive reaction is shown once an O.D. value of 1.0 is detected with a dilution of at least about 1:50, preferably of at least about 1:500.

[0122] In a dot blot assay, a purified product is preferred, although a whole cell extract can be used. Briefly, a solution of the product at a concentration of about 100 μg/mL is serially diluted two-fold with 50 mM Tris-HCl (pH 7.5). One hundred μL of each dilution is applied to a filter, such as a 0.45 μm nitrocellulose membrane, set in a 96-well dot blot apparatus (Biorad). The buffer is removed by applying vacuum to the system. Wells are washed by addition of 50 mM Tris-HCl (pH 7.5) and the membrane is air-dried. The membrane is saturated in blocking buffer (50 mM Tris-HCl (pH 7.5), 0.15 M NaCl, 10 g/L skim milk) and incubated with an antiserum diluted from about 1:50 to about 1:5000, preferably about 1:500. The reaction is detected using standard methods. For example, a goat anti-rabbit peroxidase conjugate can be added to the wells when rabbit antibodies are used. Incubation is carried out for about 90 minutes at 37° C. and the blot is washed. The reaction is developed with the appropriate substrate and stopped. The reaction is then measured visually by the appearance of a colored spot, e.g., by colorimetry. Under these experimental conditions, a positive reaction is associated with detection of a colored spot for reactions carried out with a dilution of at least about 1:50, preferably, of at least about 1:500. Therapeutic or prophylactic efficacy of a polypeptide or polypeptide derivative of the invention can be evaluated as described below.

[0123] According to a seventh aspect of the invention, there is provided (i) a composition of matter containing a polypeptide of the invention together with a diluent or carrier; (ii) a pharmaceutical composition containing a therapeutically or prophylactically effective amount of a polypeptide of the invention; (iii) a method for inducing an immune response against Helicobacter in a mammal by administering to the mammal an immunogenically effective amount of a polypeptide of the invention to elicit an immune response, e.g., a protective immune response to Helicobacter; and (iv) a method for preventing and/or treating a Helicobacter (e.g., H. pylori, H. felis, H. mustelae, or H. heilmanii) infection, by administering a prophylactic or therapeutic amount of a polypeptide of the invention to an individual in need of such treatment. Additionally, this aspect of the invention includes the use of a polypeptide of the invention in the preparation of a medicament for preventing and/or treating Helicobacter infection.

[0124] The immunogenic compositions of the invention can be administered by any conventional route in use in the vaccine field, for example, to a mucosal (e.g., ocular, intranasal, pulmonary, oral, gastric, intestinal, rectal, vaginal, or urinary tract) surface or via a parenteral (e.g., subcutaneous, intradermal, intramuscular, intravenous, or intraperitoneal) route. The choice of the administration route depends upon a number of parameters, such as the adjuvant used. For example, if a mucosal adjuvant is used, the intranasal or oral route will be preferred, and if a lipid formulation or an aluminum compound is used, a parenteral route will be preferred. In the latter case, the subcutaneous or intramuscular route is most preferred. The choice of administration route can also depend upon the nature of the vaccine agent. For example, a polypeptide of the invention fused to CTB or to LTB will be best administered to a mucosal surface.

[0125] A composition of the invention can contain one or several polypeptides or derivatives of the invention. It can also contain at least one additional Helicobacter antigen, such as the urease apoenzyme, or a subunit, fragment, homolog, mutant, or derivative thereof.

[0126] For use in a composition of the invention, a polypeptide or polypeptide derivative can be formulated into or with liposomes, such as neutral or anionic liposomes, microspheres, ISCOMS, or virus-like particles (VLPs), to facilitate delivery and/or enhance the immune response. These compounds are readily available to those skilled in the art; for example, see Liposomes: A Practical Approach (supra). Adjuvants other than liposomes can also be used in the invention and are well known in the art (see, for example, the list provided below).

[0127] Administration can be achieved in a single dose or repeated as necessary at intervals that can be determined by one skilled in the art. For example, a priming dose can be followed by three booster doses at weekly or monthly intervals. An appropriate dose depends on various parameters, including the nature of the recipient (e.g., whether the recipient is an adult or an infant), the particular vaccine antigen, the route and frequency of administration, the presence/absence or type of adjuvant, and the desired effect (e.g., protection and/or treatment), and can be readily determined by one skilled in the art. In general, a vaccine antigen of the invention can be administered mucosally in an amount ranging from about 10 μg to about 500 mg, preferably from about 1 mg to about 200 mg. For a parenteral route of administration, the dose usually should not exceed about 1 mg, and is, preferably, about 100 μg.

[0128] When used as components of a vaccine, the polynucleotides and polypeptides of the invention can be used sequentially as part of a multi-step immunization process. For example, a mammal can be initially primed with a vaccine vector of the invention, such as a pox virus, e.g., via a parenteral route, and then boosted twice with a polypeptide encoded by the vaccine vector, e.g., via the mucosal route. In another example, liposomes associated with a polypeptide or polypeptide derivative of the invention can be used for priming, with boosting being carried out mucosally using a soluble polypeptide or polypeptide derivative of the invention, in combination with a mucosal adjuvant (e.g., LT).

[0129] Polypeptides and polypeptide derivatives of the invention can also be used as diagnostic reagents for detecting the presence of anti-Helicobacter antibodies, e.g., in blood samples. Such polypeptides can be about 5 to about 80, preferably, about 10 to about 50 amino acids in length and can be labeled or unlabeled, depending upon the diagnostic method. Diagnostic methods involving such a reagent are described below.

[0130] Upon expression of a polynucleotide molecule of the invention, a polypeptide or polypeptide derivative is produced and can be purified using known methods. For example, the polypeptide or polypeptide derivative can be produced as a fusion protein containing a fused tail that facilitates purification. The fusion product can be used to immunize a small mammal, e.g., a mouse or a rabbit, in order to raise monospecific antibodies against the polypeptide or polypeptide derivative. The eighth aspect of the invention thus provides a monospecific antibody that binds to a polypeptide or polypeptide derivative of the invention.

[0131] By “monospecific antibody” is meant an antibody that is capable of reacting with a unique, naturally-occurring Helicobacter polypeptide. An antibody of the invention can be polyclonal or monoclonal. Monospecific antibodies can be recombinant, e.g., chimeric (e.g., consisting of a variable region of murine origin and a human constant region), humanized (e.g., a human immunoglobulin constant region and a variable region of animal, e.g., murine, origin), and/or single chain. Both polyclonal and monospecific antibodies can also be in the form of immunoglobulin fragments, e.g., F(ab)′2 or Fab fragments. The antibodies of the invention can be of any isotype, e.g., IgG or IgA, and polyclonal antibodies can be of a single isotype or can contain a mixture of isotypes.

[0132] The antibodies of the invention, which can be raised to a polypeptide or polypeptide derivative of the invention, can be produced and identified using standard immunological assays, e.g., Western blot assays, dot blot assays, or ELISA (see, e.g., Coligan et al., Current Protocols in Immunology, John Wiley & Sons, Inc., New York, N.Y., 1994). The antibodies can be used in diagnostic methods to detect the presence of Helicobacter antigens in a sample, such as a biological sample. The antibodies can also be used in affinity chromatography methods for purifying a polypeptide or polypeptide derivative of the invention. As is discussed further below, the antibodies can also be used in prophylactic and therapeutic passive immunization methods.

[0133] Accordingly, a ninth aspect of the invention provides (i) a reagent for detecting the presence of Helicobacter in a biological sample that contains an antibody, polypeptide, or polypeptide derivative of the invention; and (ii) a diagnostic method for detecting the presence of Helicobacter in a biological sample, by contacting the biological sample with an antibody, a polypeptide, or a polypeptide derivative of the invention, so that an immune complex is formed, and detecting the complex as an indication of the presence of Helicobacter in the sample or the organism from which the sample was derived. The immune complex is formed between a component of the sample and the antibody, polypeptide, or polypeptide derivative, and that any unbound material can be removed prior to detecting the complex. A polypeptide reagent can be used for detecting the presence of anti-Helicobacter antibodies in a sample, e.g., a blood sample, while an antibody of the invention can be used for screening a sample, such as a gastric extract or biopsy sample, for the presence of Helicobacter polypeptides.

[0134] For use in diagnostic methods, the reagent (e.g., the antibody, polypeptide, or polypeptide derivative of the invention) can be in a free state or can be immobilized on a solid support, such as, for example, on the interior surface of a tube or on the surface, or within pores, of a bead. Immobilization can be achieved using direct or indirect means. Direct means include passive adsorption (i.e., non-covalent binding) or covalent binding between the support and the reagent. By “indirect means” is meant that an anti-reagent compound that interacts with the reagent is first attached to the solid support. For example, if a polypeptide reagent is used, an antibody that binds to it can serve as an anti-reagent, provided that it binds to an epitope that is not involved in recognition of antibodies in biological samples. Indirect means can also employ a ligand-receptor system, for example, a molecule, such as a vitamin, can be grafted onto the polypeptide reagent and the corresponding receptor can be immobilized on the solid phase. This concept is illustrated by the well known biotin-streptavidin system. Alternatively, indirect means can be used, e.g., by adding to the reagent a peptide tail, chemically or by genetic engineering, and immobilizing the grafted or fused product by passive adsorption or covalent linkage of the peptide tail.

[0135] According to a tenth aspect of the invention, there is provided a process for purifying, from a biological sample, a polypeptide or polypeptide derivative of the invention, which involves carrying out antibody-based affinity chromatography with the biological sample, wherein the antibody is a monospecific antibody of the invention.

[0136] For use in a purification process of the invention, the antibody can be polyclonal or monospecific, and preferably is of the IgG type. Purified IgGs can be prepared from an antiserum using standard methods (see, e.g., Coligan et al., supra). Conventional chromatography supports, as well as standard methods for grafting antibodies, are described, for example, by Harlow et al. (Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1988).

[0137] Briefly, a biological sample, such as an H. pylori extract, preferably in a buffer solution, is applied to a chromatography material, which is, preferably, equilibrated with the buffer used to dilute the biological sample, so that the polypeptide or polypeptide derivative of the invention (i.e., the antigen) is allowed to adsorb onto the material. The chromatography material, such as a gel or a resin coupled to an antibody of the invention, can be in batch form or in a column. The unbound components are washed off and the antigen is eluted with an appropriate elution buffer, such as a glycine buffer, a buffer containing a chaotropic agent, e.g., guanidine HCl, or a buffer having high salt concentration (e.g., 3 M MgCl₂). Eluted fractions are recovered and the presence of the antigen is detected, e.g., by measuring the absorbance at 280 nm.

[0138] An antibody of the invention can be screened for therapeutic efficacy as follows. According to an eleventh aspect of the invention, there is provided (i) a composition of matter containing a monospecific antibody of the invention, together with a diluent or carrier; (ii) a pharmaceutical composition containing a therapeutically or prophylactically effective amount of a monospecific antibody of the invention, and (iii) a method for treating or preventing Helicobacter (e.g., H. pylori, H. felis, H. mustelae, or H. heilmanii) infection, by administering a therapeutic or prophylactic amount of a monospecific antibody of the invention to an individual in need of such treatment. In addition, the eleventh aspect of the invention includes the use of a monospecific antibody of the invention in the preparation of a medicament for treating or preventing Helicobacter infection.

[0139] The monospecific antibody can be polyclonal or monoclonal, and is, preferably, predominantly of the IgA isotype. In passive immunization methods, the antibody is administered to a mucosal surface of a mammal, e.g., the gastric mucosa, e.g., orally or intragastrically, optionally, in the presence of a bicarbonate buffer. Alternatively, systemic administration, not requiring a bicarbonate buffer, can be carried out. A monospecific antibody of the invention can be administered as a single active agent or as a mixture with at least one additional monospecific antibody specific for a different Helicobacter polypeptide. The amount of antibody and the particular regimen used can be readily determined by one skilled in the art. For example, daily administration of about 100 to 1,000 mg of antibody over one week, or three doses per day of about 100 to 1,000 mg of antibody over two or three days, can be effective regimens for most purposes.

[0140] Therapeutic or prophylactic efficacy can be evaluated using standard methods in the art, e.g., by measuring induction of a mucosal immune response or induction of protective and/or therapeutic immunity, using, e.g., the H. felis mouse model and the procedures described by Lee et al. (Eur. J. Gastroenterology & Hepatology 7:303, 1995) or Lee et al. (J. Infect. Dis. 172:161, 1995). Those skilled in the art will recognize that the H. felis strain of the model can be replaced with another Helicobacter strain. For example, the efficacy of polynucleotide molecules and polypeptides from H. Pylori is, preferably, evaluated in a mouse model using an H. Pylori strain. Protection can be determined by comparing the degree of Helicobacter infection in the gastric tissue assessed by, for example, urease activity, bacterial counts, or gastritis, to that of a control group. Protection is shown when infection is reduced by comparison to the control group. Such an evaluation can be made for polynucleotides, vaccine vectors, polypeptides, and polypeptide derivatives, as well as for antibodies of the invention.

[0141] For example, various doses of an antibody of the invention can be administered to the gastric mucosa of mice previously challenged with an H. Pylori strain, as described, e.g., by Lee et al. (supra). Then, after an appropriate period of time, the bacterial load of the mucosa can be estimated by assessing urease activity, as compared to a control. Reduced urease activity indicates that the antibody is therapeutically effective.

[0142] Adjuvants that can be used in any of the vaccine compositions described above are described as follows. Adjuvants for parenteral administration include, for example, aluminum compounds, such as aluminum hydroxide, aluminum phosphate, and aluminum hydroxy phosphate. The antigen can be precipitated with, or adsorbed onto, the aluminum compound using standard methods. Other adjuvants, such as RIBI (ImmunoChem, Hamilton, Mont.), can also be used in parenteral administration.

[0143] Adjuvants that can be used for mucosal administration include, for example, bacterial toxins, e.g., the cholera toxin (CT), the E. coli heat-labile toxin (LT), the Clostridium difficile toxin A, the pertussis toxin (PT), and combinations, subunits, toxoids, or mutants thereof. For example, a purified preparation of native cholera toxin subunit B (CTB) can be used. Fragments, homologs, derivatives, and fusions to any of these toxins can also be used, provided that they retain adjuvant activity. Preferably, a mutant having reduced toxicity is used. Suitable mutants are described, e.g., in WO 95/17211 (Arg-7-Lys CT mutant), WO 96/6627 (Arg-192-Gly LT mutant), and WO 95/34323 (Arg-9-Lys and Glu-129-Gly PT mutant). Additional LT mutants that can be used in the methods and compositions of the invention include, e.g., Ser-63-Lys, Ala-69-Gly, Glu-110-Asp, and Glu-112-Asp mutants. Other adjuvants, such as the bacterial monophosphoryl lipid A (MPLA) of, e.g., E. coli, Salmonella Minnesota, Salmonella typhimurium, or Shigella flexneri; saponins, and polylactide glycolide (PLGA) microspheres, can also be used in mucosal administration. Adjuvants useful for both mucosal and parenteral administrations, such as polyphosphazene (WO 95/2415), can also be used.

[0144] Any pharmaceutical composition of the invention, containing a polynucleotide, polypeptide, polypeptide derivative, or antibody of the invention, can be manufactured using standard methods. It can be formulated with a pharmaceutically acceptable diluent or carrier, e.g., water or a saline solution, such as phosphate buffer saline, optionally, including a bicarbonate salt, such as sodium bicarbonate, e.g., 0.1 to 0.5 M. Bicarbonate can advantageously be added to compositions intended for oral or intragastric administration. In general, a diluent or carrier can be selected on the basis of the mode and route of administration, and standard pharmaceutical practice. Suitable pharmaceutical carriers and diluents, as well as pharmaceutical necessities for their use in pharmaceutical formulations, are described in Remington's Pharmaceutical Sciences, a standard reference text in this field and in the USP/NF.

[0145] The invention also includes methods in which gastroduodenal infections, such as Helicobacter infection, are treated by oral administration of a Helicobacter polypeptide of the invention and a mucosal adjuvant, in combination with an antibiotic, an antisecretory agent, a bismuth salt, an antacid, sucralfate, or a combination thereof. Examples of such compounds that can be administered with the vaccine antigen and an adjuvant are antibiotics, including, e.g., macrolides, tetracyclines, β-lactams, aminoglycosides, quinolones, penicillins, and derivatives thereof (specific examples of antibiotics that can be used in the invention include, e.g., amoxicillin, clarithromycin, tetracycline, metronidizole, erythromycin, cefuroxime, and erythromycin); antisecretory agents, including, e.g., H₂-receptor antagonists (e.g., cimetidine, ranitidine, famotidine, nizatidine, and roxatidine), proton pump inhibitors (e.g., omeprazole, lansoprazole, and pantoprazole), prostaglandin analogs (e.g., misoprostil and enprostil), and anticholinergic agents (e.g., pirenzepine, telenzepine, carbenoxolone, and proglumide); and bismuth salts, including colloidal bismuth subcitrate, tripotassium dicitrate bismuthate, bismuth subsalicylate, bicitropeptide, and pepto-bismol (see, e.g., Goodwin et al., Helicobacter pylori, Biology and Clinical Practice, CRC Press, Boca Raton, Fla., pp 366-395, 1993; Physicians' Desk Reference, 49^(th) edn., Medical Economics Data Production Company, Montvale, N.J., 1995). In addition, compounds containing more than one of the above-listed components coupled together, e.g., ranitidine coupled to bismuth subcitrate, can be used. The invention also includes compositions for carrying out these methods, i.e., compositions containing a Helicobacter antigen (or antigens) of the invention, an adjuvant, and one or more of the above-listed compounds, in a pharmaceutically acceptable carrier or diluent.

[0146] Amounts of the above-listed compounds used in the methods and compositions of the invention can readily be determined by one skilled in the art. In addition, one skilled in the art can readily design treatment/immunization schedules. For example, the non-vaccine components can be administered on days 1-14, and the vaccine antigen+adjuvant can be administered on days 7, 14, 21, and 28.

[0147] Methods and pharmaceutical compositions of the invention can be used to treat or to prevent Helicobacter infections and, accordingly, gastroduodenal diseases associated with these infections, including acute, chronic, and atrophic gastritis, and peptic ulcer diseases, e.g., gastric and duodenal ulcers.

[0148] All twelve clones of the invention were originally isolated by a transposon shuttle mutagenesis method. Briefly, in this method, a TnMax9 mini-blaM transposon was used for insertional mutagenesis of an H. Pylori gene library established in E. coli. 192 E. coli clones expressing active β-lactamase fusion proteins were obtained, indicating that the corresponding target plasmids carry H. pylori genes encoding extracytoplasmic proteins. Individual mutants were transferred onto the chromosome of H. pylori P1 or P12 by natural transformation, resulting in 135 distinct H. pylori mutants. This method is described in further detail, as follows.

[0149] The transposon TnMax9 (Kahrs et al., Gene 167:53, 1995) was used to generate mutations in an H. pylori library in E. coli. As illustrated in FIG. 1A, TnMax9 contains, in addition to a cat_(GC)-resistance gene close to the inverted repeat (IR), an unexpressed open reading frame encoding β-lactamase without a promoter or leader sequence (mature β-lactamase, blaM; Kahrs et al., supra). For production of extracytoplasmic BlaM fusion proteins resulting in ampicillin resistant (amp^(R)) clones, expression of the cloned H. pylori genes in E. coli is obligatory. The minimal vector pMin2 (Kahrs et al., supra; see FIG. 1B), containing a weak constitutive promoter (P_(iga)) upstream of the multiple cloning site, was used for construction of the H. Pylori library to ensure expression of H. pylori genes in E. coli.

[0150] In construction of the library, H. Pylori DNA was partially digested with Sau3A and HpaII, size fractionated by preparative agarose gel electrophoresis, and 3-6 kb fragments were ligated into the BglII and ClaI sites of pMin2. The library was introduced into E. coli strain E181(pTnMax9), which is a derivative of HB101 containing the TnMax9 transposon, by electroporation. This generated approximately 2,400 independent transformants. More than 95% of the plasmids contained an insert of between 3 and 6 kb, showing that the 1.7 Mb H. Pylori chromosome was statistically covered. Since not every plasmid could be expected to contain a target gene carrying an export signal, the library was partitioned into a total of 198 pools (24 pools of 20 clones and 174 pools of 11 clones). Using a cotton swab, either eleven or twenty individual colonies were inoculated in 0.5 mL LB medium in a eppendorf tubes, vortexed, and 100 mL of the suspension was spread on LB agar plates supplemented with tetracycline and chloramphenicol to select for maintenance of both plasmids. Insertion of TnMax9 into the target plasmids was induced with 100 mM isopropyl-b-D-thiogalactoside (IPTG) separately for each pool (Haas et al., Gene 130:23-21, 1993). Plasmids were transferred into E145 by triparental mating, in which 25 mL of the donor strain (E181), 25 mL of the mobilisator (HB101(pRK2013)), and 50 mL of the recipient strain (E145) were mixed from corresponding bacterial suspensions (O.D.₅₅₀=10). The matings were performed for 2-3 hours at 37° C. on nitrocellulose filters, which were placed on LB plates. Bacteria were suspended in 1 mL LB and aliquots were spread on LB plates containing chloramphenicol, tetracycline, and rifampicin. Each pool gave rise to chloramphenicol-resistant transconjugates in E145, demonstrating that both transposition and conjugation were successful. Generally, several thousand chloramphenicol-resistant transconjugates were obtained, but the number of amp^(R) colonies varied in different pools, ranging from one to several hundred colonies. Two amp^(R) colonies from each positive pool were isolated, plasmid DNA was extracted, and the DNA was characterized by further restriction analysis. Only those TnMax9 insertions of a single pool that mapped in obviously different plasmid clones, or in markedly different regions of the same clone, were used further.

[0151] From 158 of the 198 pools, ampicillin-resistant E145 transconjugates were obtained (80%), showing that in several pools, TnMax9 inserted into expressed genes, resulting in production of extracytoplasmic BlaM fusion proteins. Thus, a total of 192 amp^(R) E145 clones could be isolated by conjugal transfer of plasmids from 198 pools.

[0152] To analyze the mutant library, it was determined whether defined gene sequences inactivated by TnMax9 were represented once or several times in the whole library. Five transposon-containing plasmids conferring an amp^(R) phenotype to E145 (pMu7, pMu13, pMu75, pMu94, and pMu110) were randomly selected and DNA fragments flanking the TnMax9 insert were isolated and used as probes in Southern hybridization of 120 amp^(R) clones. The hybridization probes isolated from clones pMu7, pMu75, and pMu94 were between 0.9 and 1.1 kb in size, and hybridized exclusively with the inserts of the homologous plasmids. In contrast, the TnMax9 flanking regions of clones pMu13 and pMu110 were 4.0 kb and 5.5 kb, respectively. They each hybridized with the homologous plasmids, and with one additional clone of the library. Such a result was expected, since the chance of a probe to find a homologous sequence in the library should be higher, the longer the hybridization probes.

[0153] In order to verify the insertion of the transposon into distinct ORFs encoding putative exported proteins, the TnMax9-flanking DNA of five representative amp^(R) mutant clones (pMu7, pMu12, pMu18, pMu20, and pMu26) was sequenced, taking advantage of the M13 forward and reverse primers on TnMax9 (FIG. 1A). This analysis revealed that the mini-transposon was inserted into different sequences in each plasmid, thereby interrupting ORFs encoding putative proteins. For two clones, the sequences located upstream of the blaM gene revealed a putative ribosome-binding site and a potential translational start codon (ATG). Other clones either revealed an ORF spanning the complete sequence (approximately 400 base pairs upstream and downstream of the TnMax9 insertion) or terminating shortly after the site of TnMax9 insertion. The partial protein sequences from different ORFs were used for database searches, but no significant homologies with known proteins were found.

[0154] In a further approach, it was determined whether a known gene, like vacA, encoding the extracellular vacuolating cytotoxin of H. pylori, could be identified using this method and how often such a mutation would be represented in the mutant library. A total cell lysates of the 135 mutants were tested in an immunoblot using the H. pylori cytotoxin-specific rabbit antiserum AK197 (Schmitt et al., Mol. Microbiol. 12:307-319, 1994). Two mutants were identified that no longer produced the cytotoxin antigen (mutants P1-26 and P1-47) and partial DNA sequencing of the insertion sites revealed that TnMax9 was inserted at distinct positions in the vacA gene, 56 and 53 codons downstream of the ATG start codon, respectively.

[0155] Thus, the characterization of the mutant collection confirmed that a representative gene library was constructed in E. coli, in which target genes encoding exported H. pylori proteins were efficiently tagged by TnMax9.

[0156] In order to establish a collection of mutants lacking distinct exported proteins, the mutations had to be transferred back into the H. Pylori chromosome. By means of natural transformation, 86 plasmids could be transformed into the original strain P1. H. Pylori strains P1 or P12, which were naturally competent for DNA transformation, were transformed with circular plasmid DNA (0.2-0.5 mg/transformation). Transformations to streptomycin resistance were performed with chromosomal DNA (1 mg/transformation), isolated from a streptomycin-resistant NCTC11637H. Pylori mutant according to the procedure described in Haas et al. (Mol. Microbiol. 8:753-760). Selection was performed on serum plates containing 4 mg/mL chloramphenicol or 500 mg/mL streptomycin. The transformation frequency for a given mutant was calculated as the number of chloramphenicol-, streptomycin-, or erythromycin-resistant colonies per cfiu (average of three experiments). The blaM gene was deleted by NotI digestion, and the plasmid religated, in those plasmids that did not transform strain P1 directly.

[0157] This procedure, which resulted in a twenty- to thirty-fold higher frequency of transformation, as compared to the same plasmid containing blaM, resulted in 36 additional mutants strain P1. The blaM-deletion plasmids that still did not transform strain P1 were used to transform the heterologous H. pylori strain P12, possessing an approximately 10-fold higher transformation frequency compared to P1. This resulted in thirteen further mutants.

[0158] Thus, from the 192 amp^(R) plasmids a total of 135H. Pylori mutants (122 25 mutants in P1 and 13 mutants in P12) were finally obtained by selection for chloramphenicol resistance (70%). The transformation frequency varied between different plasmids in the range of 1×10⁻⁵-1×10⁻⁷. The remaining plasmids did not result in any transformants. The collection was frozen as individual mutants in stock cultures at −70° C. To verify the correct insertion of the mini-transposon into the H. pylori chromosome, ten representative mutants were tested by Southern hybridization of chromosomal DNA using cat_(GC) DNA and the vector pMin2 as probes. Consistent with our previous experience concerning TnMax9-based shuttle mutagenesis of H. pylori, the mini-transposon was, in all cases, inserted into the chromosome without integration of the vector DNA, which probably means by a double cross-over, rather than by a single cross-over event. As judged from the hybridization pattern obtained with the cat gene as a probe, it appears that TnMax9 is located in different regions of the chromosome, showing that distinct target genes have been interrupted in individual mutants.

[0159] The mutants were analyzed for motility, transformation competence, and adherence to KatoIII cells. Screening of the H. pylori mutant collection allowed identification of mutants impaired in motility, natural transformation competence, and adherence to gastric epithelial cell lines. Motility mutants could be grouped into distinct classes: (i) mutants lacking the major flagellin subunit FlaA and intact flagella; (ii) mutants with apparently normal flagella, but reduced motility; and (iii) mutants with obviously normal flagella, but completely abolished motility. Two independent mutations, which exhibited defects in natural competence for genetic transformation, mapped to different genetic loci. In addition, two independent mutants were isolated by their failure to bind to the human gastric carcinoma cell line KatoIII. Both mutants carried a transposon in the same gene, approximately 0.8 kb apart, and showed decrease autoagglutination, when compared to the wild type strain.

[0160] Sequences of clones obtained using the above-described transposon shuttle mutageneis method were used to identify intact genes, lacking inserted he H. pylori genome, as is described below in Example 1.

[0161] The polynucleotides of the present application are full length clones several partial clones set forth in the parent application, U.S. Ser. No. 08/749,051, filed on Nov. 14, 1996. The relationship between the clones of the present application and those of the parent application are as follows: GHPO (present application) HPO (parent application) 1012 121  1190 76 1398 15 1501 45 1550 38 1620 87  276 42  329 70  470 132   574 71  689  9  706 56

[0162] The invention is further illustrated by the following examples. Example 1 describes identification of genes, such as genes that encode the polypeptides of the invention, the Helicobacter genome, as well as identification of leader sequences, and primer design for amplification of genes lacking signal sequences. Example 2 describes cloning of DNA encoding GHPO 276, GHPO 574, and GHPO 689 vector that provides a histidine tag, and production and purification of the resulting his-tagged fusion proteins. Example 3 describes methods for cloning DNA encoding the polypeptides of the invention so that they can be produced without his-tags, and Example 4 describes methods for purifying recombinantly produced polypeptides of the invention.

EXAMPLE 1 Identification of Genes in the H. Pylori Genome, Identification of Leader Sequences, and Primer Design for Amplification of Genes Lacking Signal Sequences

[0163] 1.A. Creating H. Pylori Genomic Databases

[0164] The H. pylori genome was provided as a text file containing a single contiguous string of nucleotides that had been determined to be 1.76 Megabases in length. The complete genome was split into 17 separate files using the program SPLIT (Creativity in Action), giving rise to 16 contigs, each containing 100,000 nucleotides, and a 17^(th) contig containing the remaining 76,000 nucleotides. A header was added to each of the 17 files using the format: >hpg0.txt (representing contig 1),.hpg1.txt (representing contig 2), etc. The resulting 17 files, named hpg0 through hpg16, were then copied together to form one file that represented the plus strand of the complete H. pylori genome. The constructed database was given the designation “H.” A negative strand database of the H. Pylori genome was created similarly by first creating a reverse complement of the positive strand using the program SeqPup (D.G. Gilbert, Indiana University Biology Department) and then performing the same procedure as described above for the plus strand. This database was given the designation “N.”

[0165] The regions predicted to encode open reading frames (ORFs) were defined for the complete H. pylori genome using the program GENEMARK™ (Borodovsky et al., Comp. Chem. 17:123, 1993). A database was created from a text file containing an annotated version of all ORFs predicted to be encoded by the H. pylori genome for both the plus and minus strands, and was given the designation “O.” Each ORF was assigned a number indicating its location on the genome and its position relative to other genes. No manipulation of the text file was required.

[0166] 1.B. Searching the H. Pylori Databases

[0167] The databases constructed as is described above were searched using the program FASTA (Pearson et al., Proc. Natl. Acad. Sci. USA 85:2444-2448, 1988). FASTA was used for searching either a DNA sequence against either of the gene databases (“H” and/or “N”), or a peptide sequence against the ORF library (“O”). TFASTX was used to search a peptide sequence against all possible reading frames of a DNA database (“H” and/or “N” libraries). Potential frameshifts also being resolved, FASTX was used for searching the translated reading frames of a DNA sequence against either a DNA database, or a peptide sequence against the protein database.

[0168] 1.C. Isolation of DNA sequences from the H. Pylori Genome

[0169] The FASTA searches against the constructed DNA databases identified exact nucleotide coordinates on one or more of the isolated contigs, and therefore the location of the target DNA. Once the exact location of the target sequence was known, the contig identified to carry the gene was exported into the software package MapDraw (DNAStar, Inc.) and the gene was isolated. Gene sequences with flanking DNA was then excised and copied into the EditSeq. Software package (DNAStar, Inc.) for further analysis.

[0170] 1.D. Identification of Leader Sequences

[0171] The deduced protein encoded by a target gene sequence is analyzed using the PROTEAN software package (DNAStar, Inc.). This analysis predicts those areas of the protein that are hydrophobic by using the Kyte-Doolittle algorithm, and identifies any potential polar residues preceding the hydrophobic core region, which is typical for many leader sequences. For confirmation, the target protein is then searched against a PROSITE database (DNAStar, Inc.) consisting of motifs and signatures. Characteristic of many leader sequences and hydrophobic regions in general, is the identification of predicted prokaryotic lipid attachment sites. Where confirmation between the two approaches is apparent at the N-terminus of any protein, putative cleavage sites are sought. Specifically, this includes the presence of either an Alanine (A), Serine (S), or Glycine (G) residue immediately after the core hydrophobic region. In the case of lipoproteins, a Cysteine (C) residue would be identified as the+1 residue, post-cleavage.

[0172] 1.E. Rational Design of PCR Primers Based on the Identification of Leader Sequences

[0173] In order to clone gene sequences as N-terminus translational fusions for the generation of recombinant proteins with N-terminal Histidine tags, the gene sequence that specifies the leader sequence is omitted. The 5′-end of the gene-specific portion of the N-terminal primer is designed to start at the first codon beyond the cleavage site. In the case of lipoproteins, the 5′-end of the N-terminal primer begins at the second codon, immediately after the modifiable residue at position +1 post-cleavage. The omission of the leader sequence from the recombinant allows for one-step purification, and potential problems associated with insertion of leader sequences in the membrane of the host strain carrying the hybrid construct are avoided.

EXAMPLE 2 Preparation of Isolated DNA Encoding GHPO 276, GHPO 574and GHPO 689, and Production of these Polypeptides as Histidine-Tagged Fusion Proteins

[0174] 2.A. Preparation of Genomic DNA from Helicobacter pylori

[0175]Helicobacter pylori strain ORV2001, stored in LB medium containing 50% glycerol at −70° C., is grown on Colombia agar containing 7% sheep blood for 48 hours under microaerophilic conditions (8-10% CO₂, 5-7% O₂, 85-87% N₂). Cells are harvested, washed with phosphate buffer saline (PBS) (pH 7.2), and DNA is then extracted from the cells using the Rapid Prep Genomic DNA Isolation kit (Pharmacia Biotech).

[0176] 2.B. PCR Amplification

[0177] DNA molecules encoding GHPO 276, GHPO 574, and GHPO 689 are amplified from genomic DNA, as can be prepared as is described above, by the Polymerase Chain Reaction (PCR) using the following primers:

[0178] GHPO 276:

[0179] N-terminal primer:

[0180] 5′-CGCGGATCCGAACTCTTTTTAAGTCAAGCAAT-3′ (SEQ ID NO:61); and

[0181] C-terminal primer:

[0182] 5′-CCGCTCGAGTTAAAATTTGTAAGTCAAATTC-3′ (SEQ ID NO:62).

[0183] GHPO 574:

[0184] N-terminal primer:

[0185] 5′-GGGAATTCTTGAAATTTAAATATG-3′ (SEQ ID NO:63); and

[0186] C-terminal primer:

[0187] 5′-CCGCTCGAGTTAAAAAATAAACGC-3′ (SEQ ID NO:64).

[0188] GHPO 689:

[0189] N-terminal primer:

[0190] 5′-CGCGGATCCGAAAAGAAAAGGATAACCCCTTGC-3′ (SEQ ID NO:65);

[0191] and

[0192] C-terminal primer:

[0193] 5′-CCGCTCGAGTCAAAAGCCTATGTTGTAGCC-3′ (SEQ ID NO:66).

[0194] The N-terminal and C-terminal primers for each clone both include a 5′ clamp and a restriction enzyme recognition sequence for cloning purposes (BamHI (GGATCC) and XhoI (CTCGAG) recognition sequences). The N-terminal primer is designed so that the amplified product does not encode the leader sequence and the potential cleavage site.

[0195] Amplification of gene-specific DNA is carried out using Vent DNA Polymerase (New England Biolabs) or Taq DNA polymerase (Appligene) according to the manufacturer's instructions. The reaction mixture, which is brought to a final volume of 100 μL with distilled water, is as follows: dNTPs mix 200 μM 10x ThermoPol buffer 10 μL primers 300 nM each DNA template 50 ng Heat-stable DNA polymerase 2 units

[0196] Appropriate amplification reaction conditions can readily be determined by one skilled in the art. In the present case, the following conditions were used for amplification of DNA encoding GHPO 276 with Vent DNA polymerase: a denaturing step was carried out at 97° C. for 30 seconds, followed by an annealing step at 55° C. for 45 seconds, and an extension step at 72° C. for 1 minute and 30 seconds. Twenty five cycles were carried out. For amplification of DNA encoding GHPO 689 Taq DNA polymerase was used, and a denaturing step was carried out at 95° C. for 30 seconds, followed by an annealing step at 50° C. for 1 minute, and an extension step at 72° C. for 2 minutes and 30 seconds. Twenty five cycles were carried out.

[0197] 2.C. Transformation and Selection of Transformants

[0198] A single PCR product is thus amplified and is then digested at 37° C. for 2 hours with BamHI and XhoI concurrently in a 20 μL reaction volume. The digested product is ligated to similarly cleaved pET28a (Novagen) that is dephosphorylated prior to the ligation by treatment with Calf Intestinal Alkaline Phosphatase (CIP). The gene fusion constructed in this manner allows one-step affinity purification of the resulting fusion protein because of the presence of histidine residues at the N-terminus of the fusion protein, which are encoded by the vector.

[0199] The ligation reaction (20 μL) is carried out at 14° C. overnight and then is used to transform 100 μL fresh E. coli XL1-blue competent cells (Novagen). The cells are incubated on ice for 2 hours, heat-shocked at 42° C. for 30 seconds, and returned to ice for 90 seconds. The samples are then added to 1 mL LB broth in the absence of selection and grown at 37° C. for 2 hours. The cells are plated out on LB agar containing kanamycin (50 μg/mL) at a 10× and neat dilution and incubated overnight at 37° C. The following day, 50 colonies are picked onto secondary plates and incubated at 37° C. overnight.

[0200] Five colonies are picked into 3 mL LB broth supplemented with kanamycin (100 μg/mL) and are grown overnight at 37° C. Plasmid DNA is extracted using the Quiagen mini-prep. method and is quantitated by agarose gel electrophoresis.

[0201] PCR is performed with the gene-specific primers under the conditions set forth above and transformant DNA is confirmed to contain the desired insert. If PCR-positive, one of the five plasmid DNA samples (500 ng) extracted from the E. coli XL1-blue cells is used to transform competent BL21 (λDE3) E. coli competent cells (Novagen; as described previously). Transformants (10) are picked onto selective kanamycin (50 μg/mL) containing LB agar plates and stored as a research stock in LB containing 50% glycerol.

[0202] 2.D. Purification of Recombinant Proteins

[0203] One mL of frozen glycerol stock prepared as described in 2.C. is used to inoculate 50 mL of LB medium containing 25 μg/mL of kanamycin in a 250 mL Erlenmeyer flask. The flask is incubated at 37° C. for 2 hours or until the absorbance at 600 nm (OD₆₀₀) reaches 0.4-1.0. The culture is stopped from growing by placing the flask at 4° C. overnight. The following day, 10 mL of the overnight culture are used to inoculate 240 mL LB medium containing kanamycin (25 μg/mL), with the initial OD₆₀₀ about 0.02-0.04. Four flasks are inoculated for each ORF. The cells are grown to an OD₆₀₀ of 1.0 (about 2 hours at 37° C.), a 1 mL sample is harvested by centrifugation, and the sample is analyzed by SDS-PAGE to detect any leaky expression. The remaining culture is induced with 1 mM IPTG and the induced cultures are grown for an additional 2 hours at 37° C.

[0204] The final OD₆₀₀ is taken and the cells are harvested by centrifugation at 5,000× g for 15 minutes at 4° C. The supernatant is discarded and the pellets are resuspended in 50 mM Tris-HCl (pH 8.0), 2 mM EDTA. Two hundred and fifty mL of buffer are used for 1 L of culture and the cells are recovered by centrifugation at 12,000× g for 20 minutes. The supernatant is discarded and the pellets are stored at −45° C.

[0205] 2. E. Protein Purification

[0206] Pellets obtained from 2.D. are thawed and resuspended in 95 mL of 50 mM Tris-HCl (pH 8.0). Pefabloc and lysozyme are added to final concentrations of 100 μM and 100 μg/mL, respectively. The mixture is homogenized with magnetic stirring at 5° C. for 30 minutes. Benzonase (Merck) is added at a 1 U/mL final concentration, in the presence of 10 mM MgCl₂, to ensure total digestion of the DNA. The suspension is sonicated (Branson Sonifier 450) for 3 cycles of 2 minutes each at maximum output. The homogenate is centrifuged at 19,000× g for 15 minutes and both the supernatant and the pellet are analyzed by SDS-PAGE to detect the cellular location of the target protein in the soluble or insoluble fractions, as is described further below.

[0207] 2.E.1. Soluble Fraction

[0208] If the target protein is produced in a soluble form (i.e., in the supernatant obtained in 2.E.) NaCl and imidazole are added to the supernatant to final concentrations of 50 mM Tris-HCl (pH 8.0), 0.5 M NaCl, and 10 mM imidazole (buffer A). The mixture is filtered through a 0.45 μm membrane and loaded onto an IMAC column (Pharmacia HiTrap chelating Sepharose; 1 mL), which has been charged with nickel ions according to the manufacturer's recommendations. After loading, the column is washed with 50 column volumes of buffer A and the recombinant target protein is eluted with 5 mL of buffer B (50 mM Tris-HCl (pH 8.0), 0.5 M NaCl, 500 mM imidazole).

[0209] The elution profile is monitored by measuring the absorbance of the fractions at 280 nm. Fractions corresponding to the protein peak are pooled, dialyzed against PBS containing 0.5 M arginine, filtered through a 0.22 μm membrane, and stored at −45° C.

[0210] 2.E.2. Insoluble Fraction

[0211] If the target protein is expressed in the insoluble fraction (pellets obtained from 2.E.), purification is conducted under denaturing conditions. NaCl, imidazole, and urea are added to the resuspended pellet to final concentrations of 50 mM Tris-HCl (pH 8.0), 0.5 M NaCl, 10 mM imidazole, and 6 M urea (buffer C). After complete solubilization, the mixture is filtered through a 0.45 μm membrane and loaded onto an IMAC column.

[0212] The purification procedures on the IMAC column are the same as described in 2.E.1., except that 6 M urea is included in all buffers used and 10 column volumes of buffer C are used to wash the column after protein loading, instead of 50 column volumes.

[0213] The protein fractions eluted from the IMAC column with buffer D (buffer C containing 500 mM imidazole) are pooled. Arginine is added to the solution to final concentration of 0.5 M and the mixture is dialyzed against PBS containing 0.5 M arginine and various concentrations of urea (4 M, 3 M, 2 M, 1 M, and 0.5 M) to progressively decrease the concentration of urea. The final dialysate is filtered through a 0.22 μm membrane and stored at −45° C.

[0214] Alternatively, when the above purification process is not as efficient as it should be, two other processes may be used as follows. A first alternative involves the use of a mild denaturant, N-octyl glucoside (NOG). Briefly, a pellet obtained in 2.E. is homogenized in 5 mM imidazole, 500 mM sodium chloride, 20 mM Tris-HCl (pH 7.9) by microfluidization at a pressure of 15,000 psi and is clarified by centrifugation at 4,000-5,000× g. The pellet is recovered, resuspended in 50 mM NaPO₄ (pH 7.5) containing 1-2% weight/volume NOG, and homogenized. The NOG-soluble impurities are removed by centrifugation. The pellet is extracted once more by repeating the preceding extraction step. The pellet is dissolved in 8 M urea, 50 mM Tris (pH 8.0). The urea-solubilized protein is diluted with an equal volume of 2 M arginine, 50 mM Tris (pH 8.0), and is dialyzed against 1 M arginine for 24-48 hours to remove the urea. The final dialysate is filtered through a 0.22 μm membrane and stored at −45° C.

[0215] A second alternative involves the use of a strong denaturant, such as guanidine hydrochloride. Briefly, a pellet obtained in 2.E. is homogenized in 5 mM imidazole, 500 mM sodium chloride, 20 mM Tris-HCl (pH 7.9) by microfluidization at a pressure of 15,000 psi and clarified by centrifugation at 4,000-5,000× g. The pellet is recovered, resuspended in 6 M guanidine hydrochloride, and passed through an IMAC column charged with Ni++. The bound antigen is eluted with 8 M urea (pH 8.5). Beta-mercaptoethanol is added to the eluted protein to a final concentration of 1 mM, then the eluted protein is passed through a Sephadex G-25 column equilibrated in 0.1 M acetic acid. Protein eluted from the column is slowly added to 4 volumes of 50 mM phosphate buffer (pH 7.0). The protein remains in solution.

[0216] 2.F. Evaluation of the Protective Activity of the Purified Protein

[0217] Groups of 10 OF1 mice (IFFA Credo) are immunized rectally with 25 μg of the purified recombinant protein, admixed with 1 μg of cholera toxin (Berna) in physiological buffer. Mice are immunized on days 0, 7, 14, and 21. Fourteen days after the last immunization, the mice are challenged with H. pylori strain ORV2001 grown in liquid media (the cells are grown on agar plates, as described in 2.A., and, after harvest, the cells are resuspended in Brucella broth; the flasks are then incubated overnight at 37° C.). Fourteen days after challenge, the mice are sacrificed and their stomachs are removed. The amount of H. pylori is determined by measuring the urease activity in the stomach and by culture.

[0218] 2.G. Production of Monospecific Polyclonal Antibodies

[0219] 2.G.1. Hyperimmune Rabbit Antiserum

[0220] New Zealand rabbits are injected both subcutaneously and intramuscularly with 100 μg of a purified fusion polypeptide, as obtained in 2.E.1. or 2.E.2., in the presence of Freund's complete adjuvant and in a total volume of approximately 2 mL. Twenty one and 42 days after the initial injection, booster doses, which are identical to priming doses, except that Freund's incomplete adjuvant is used, are administered in the same way. Fifteen days after the last injection, animal serum is recovered, decomplemented, and filtered through a 0.45 μm membrane.

[0221] 2.G.2. Mouse Hyperimmune Ascites Fluid

[0222] Ten mice are injected subcutaneously with 10-50 μg of a purified fusion polypeptide as obtained in 2.E.1. or 2.E.2., in the presence of Freund's complete adjuvant and in a volume of approximately 200 μL. Seven and 14 days after the initial injection, booster doses, which are identical to the priming doses, except that Freund's incomplete adjuvant is used, are administered in the same way. Twenty one and 28 days after the initial infection, mice receive 50 μg of the antigen alone intraperitoneally. On day 21, mice are also injected intraperitoneally with sarcoma 180/TG cells CM26684 (Lennette et al., Diagnostic Procedures for Viral, Rickettsial, and Chlamydial Infections, 5th Ed. Washington D.C., American Public Health Association, 1979). Ascites fluid is collected 10-13 days after the last injection.

EXAMPLE 3 Methods for Producing Transcriptional Fusions Lacking His-Tags

[0223] Methods for amplification and cloning of DNA encoding the polypeptides of the invention as transcriptional fusions lacking His-tags are described as follows. Two PCR primers for each clone are designed based upon the sequences of the polynucleotides that encode them (SEQ ID NOs: 1-23 (odd numbers)).

[0224] These primers can be used to amplify DNA encoding the polypeptides of the invention from any Helicobacter pylori strain, including, for example, ORV2001 and the strain deposited as ATCC deposit number 43579, as well as from other Helicobacter species.

[0225] The N-terminal primers are designed to include the ribosome binding site of the target gene, the ATG start site, the leader sequence, and the cleavage site. The N-terminal primers can include a 5′ clamp and a restriction endonuclease recognition site, such as that for BamHI (GGATCC), which facilitates subsequent cloning. Similarly, the C-terminal primers can include a restriction endonuclease recognition site, such as that for XhoI (CTCGAG), which can be used in subsequent cloning, and a TAA stop codon. Specific primers that can be used are listed above.

[0226] Amplification of genes encoding the polypeptides of the invention is carried out using Thennalase DNA Polymerase under the conditions described above in Example 2. Alternatively, Vent DNA polymerase (New England Biolabs), Pwo DNA polymerase (Boehringer Mannheim), or Taq DNA polymerase (Appligene) can be used, according to instructions provided by the manufacturers.

[0227] A single PCR product for each clone is amplified and cloned into appropriately cleaved pET 24 (e.g., BamHI-XhoI cleaved pET 24), resulting in construction of a transcriptional fusion that permits expression of the proteins without His-tags. The expressed products can be purified as denatured proteins that are refolded by dialysis into 1 M arginine.

[0228] Cloning into pET 24 allows transcription of the genes from the T7 promoter, which is supplied by the vector, but relies upon binding of the RNA-specific DNA polymerase to the intrinsic ribosome binding sites of the genes, and thereby expression of the complete ORF. The amplification, digestion, and cloning protocols are as described above for constructing translational fusions.

EXAMPLE 4 Purification of the new Polypeptides by Immunoaffinity

[0229] 4.A. Purification of Specific IgGs

[0230] An immune serum, as prepared in section 2.G., is applied to a protein A Sepharose Fast Flow column (Pharmacia) equilibrated in 100 mM Tris-HCl (pH 8.0). The resin is washed by applying 10 column volumes of 100 mM Tris-HCl and 10 volumes of 10 mM Tris-HCl (pH 8.0) to the column. IgG antibodies are eluted with 0.1 M glycine buffer (pH 3.0) and are collected as 5 mL fractions to which is added 0.25 mL 1 M Tris-HCl (pH 8.0). The optical density of the eluate is measured at 280 nm and the fractions containing the IgG antibodies are pooled, dialyzed against 50 mM Tris-HCl (pH 8.0), and, if necessary, stored frozen at −70° C.

[0231] 4.B. Preparation of the Column

[0232] An appropriate amount of CNBr-activated Sepharose 4B gel (1 g of dried gel provides for approximately 3.5 mL of hydrated gel; gel capacity is from 5 to 10 mg coupled IgG/mL of gel) manufactured by Pharmacia (17-0430-01) is suspended in 1 mM HCl buffer and washed with a buchner by adding small quantities of 1 mM HCl buffer. The total volume of buffer is 200 mL per gram of gel.

[0233] Purified IgG antibodies are dialyzed for 4 hours at 20±5° C. against 50 volumes of 500 mM sodium phosphate buffer (pH 7.5). The antibodies are then diluted in 500 mM phosphate buffer (pH 7.5) to a final concentration of 3 mg/mL.

[0234] IgG antibodies are mixed with the gel overnight at 5±3° C. The gel is packed into a chromatography column and is washed with 2 column volumes of 500 mM phosphate buffer (pH 7.5), and 1 column volume of 50 mM sodium phosphate buffer, containing 500 mM NaCl (pH 7.5). The gel is then transferred to a tube, mixed with 100 mM ethanolamine (pH 7.5) for 4 hours at room temperature, and washed twice with 2 column volumes of PBS. The gel is then stored in 1/10,000 PBS merthiolate. The amount of IgG antibodies coupled to the gel is determined by measuring the optical density (OD) at 280 nm of the IgG solution and the direct eluate, plus washings.

[0235] 4.C. Adsorption and Elution of the Antigen

[0236] An antigen solution in 50 mM Tris-HCl (pH 8.0), 2 mM EDTA, for example, the supernatant obtained in 3.E. or the solubilized pellet obtained in 3.E., after centrifugation and filtration through a 0.45 μm membrane, is applied to a column equilibrated with 50 mM Tris-HCl (pH 8.0), 2 mM EDTA, at a flow rate of about 10 mL/hour. The column is then washed with 20 volumes of 50 mM Tris-HCl (pH 8.0), 2 mM EDTA. Alternatively, adsorption can be achieved by mixing overnight at 5±3° C.

[0237] The adsorbed gel is washed with 2 to 6 volumes of 10 mM sodium phosphate buffer (pH 6.8) and the antigen is eluted with 100 mM glycine buffer (pH 2.5). The eluate is recovered in 3 mL fractions, to each of which is added 150 μL of 1 M sodium phosphate buffer (pH 8.0). Absorption is measured at 280 nm for each fraction; those fractions containing the antigen are pooled and stored at −20° C.

[0238] Other embodiments are within the following claims.

1 66 1968 base pairs nucleic acid single linear Coding Sequence 153...1793 (A) NAME/KEY Signal Sequence (B) LOCATION 153...219 (D) OTHER INFORMATION 1 TCTGGGGGCA TTGCTTACCC TACTACTCGC TTGAAACGCC CAAGCCTGAT CCAATCTCAT 60 AAAGATTCTA ATCGCAATTT TAAAACCATC ACTTTTTGGC TCGTTCCCAC AAAAAGCCAC 120 GCAACTTACT ACATCATTAA GGTTTAATCA CA ATG GAT AAA AAC AAC AAT AAT 173 Met Asp Lys Asn Asn Asn Asn -20 CTC CGC TTG ATT TTA GCG ATC GCT CTG TCT TTC TTG TTT ATC GCT CTT 221 Leu Arg Leu Ile Leu Ala Ile Ala Leu Ser Phe Leu Phe Ile Ala Leu -15 -10 -5 1 TAT AGC TAT TTT TTC CAA AAA CCA AAC AAA ACA ACA ACC CAA ACC ACA 269 Tyr Ser Tyr Phe Phe Gln Lys Pro Asn Lys Thr Thr Thr Gln Thr Thr 5 10 15 AAG CAA GAA ACA ACC AAC AAC CAT ACA GCA ACA AGT CCT AAC GCG CCC 317 Lys Gln Glu Thr Thr Asn Asn His Thr Ala Thr Ser Pro Asn Ala Pro 20 25 30 AAC GCC CAA CAT TTT AGC ACC ACT CAA ACA ACC CCC CAA GAG AAT TTG 365 Asn Ala Gln His Phe Ser Thr Thr Gln Thr Thr Pro Gln Glu Asn Leu 35 40 45 CTA AGC ACG ATT TCT TTT GAG CAT GCC AGG ATT GAA ATT GAT TCT TTA 413 Leu Ser Thr Ile Ser Phe Glu His Ala Arg Ile Glu Ile Asp Ser Leu 50 55 60 65 GGG CGC ATC AAA CAG GTT TAT CTC AAG GAT AAA AAG TAT CTA ACC CCT 461 Gly Arg Ile Lys Gln Val Tyr Leu Lys Asp Lys Lys Tyr Leu Thr Pro 70 75 80 AAA CAA AAG GGC TTT TTA GAG CAT GTG GGC CAT CTT TTT AGC TCC AAA 509 Lys Gln Lys Gly Phe Leu Glu His Val Gly His Leu Phe Ser Ser Lys 85 90 95 GAA AAC GCG CAA CCC CCC CTA AAA GAG CTC CCC CTT TTA GCA GCC GAT 557 Glu Asn Ala Gln Pro Pro Leu Lys Glu Leu Pro Leu Leu Ala Ala Asp 100 105 110 AAA CTC AAG CCT TTA GAA GTG CGT TTT TTA GAC CCT ACG CTC AAT AAC 605 Lys Leu Lys Pro Leu Glu Val Arg Phe Leu Asp Pro Thr Leu Asn Asn 115 120 125 AAA GCG TTC AAC ACC CCT TAT AGC GCT TCA AAA ACC ACT CTT GGG CCT 653 Lys Ala Phe Asn Thr Pro Tyr Ser Ala Ser Lys Thr Thr Leu Gly Pro 130 135 140 145 AAC GAA CAG CTT GTT TTA ACC CAA GAT TTA GGC ACT CTT AGC ATC ATT 701 Asn Glu Gln Leu Val Leu Thr Gln Asp Leu Gly Thr Leu Ser Ile Ile 150 155 160 AAA ACC CTG ACT TTC TAT GAT GAT TTG CAT TAT GAT TTA AAA ATC GCA 749 Lys Thr Leu Thr Phe Tyr Asp Asp Leu His Tyr Asp Leu Lys Ile Ala 165 170 175 TTC AAA TCG CCC AAT AAC CTT ATC CCT AGC TAT GTG ATC ACC AAT GGT 797 Phe Lys Ser Pro Asn Asn Leu Ile Pro Ser Tyr Val Ile Thr Asn Gly 180 185 190 TAC AGG CCG GTG GCT GAT TTG GAC AGC TAC ACC TTT TCA GGC GTG CTT 845 Tyr Arg Pro Val Ala Asp Leu Asp Ser Tyr Thr Phe Ser Gly Val Leu 195 200 205 TTA GAA AAT AGC GAC AAA AAA ATT GAA AAA ATT GAA GAT AAA GAC GCT 893 Leu Glu Asn Ser Asp Lys Lys Ile Glu Lys Ile Glu Asp Lys Asp Ala 210 215 220 225 AAA GAA ATC AAA CGC TTT TCT AAC ACC CTC TTT TTA TCC AGC GTG GAT 941 Lys Glu Ile Lys Arg Phe Ser Asn Thr Leu Phe Leu Ser Ser Val Asp 230 235 240 AGG TAT TTC ACC ACC TTG CTT TTC ACT AAA GAT CCT CAA GGT TTT GAA 989 Arg Tyr Phe Thr Thr Leu Leu Phe Thr Lys Asp Pro Gln Gly Phe Glu 245 250 255 GCC TTA ATT GAT TCA GAA ATC GGC ACT AAA AAC CCC TTA GGG TTC ATT 1037 Ala Leu Ile Asp Ser Glu Ile Gly Thr Lys Asn Pro Leu Gly Phe Ile 260 265 270 TCC CTT AAA AAT GAA GCG AAT TTG CAT GGC TAT ATT GGC CCT AAG GAT 1085 Ser Leu Lys Asn Glu Ala Asn Leu His Gly Tyr Ile Gly Pro Lys Asp 275 280 285 TAC CGC TCT TTG AAA GCG ATT TCA CCC ATG CTC ACC GAT GTG ATA GAG 1133 Tyr Arg Ser Leu Lys Ala Ile Ser Pro Met Leu Thr Asp Val Ile Glu 290 295 300 305 TAT GGC TTA ATC ACT TTC TTT GCA AAA GGC GTG TTT GTT TTA CTG GAT 1181 Tyr Gly Leu Ile Thr Phe Phe Ala Lys Gly Val Phe Val Leu Leu Asp 310 315 320 TAT TTG TAT CAA TTC GTG GGC AAT TGG GGT TGG GCT ATC ATT CTT TTA 1229 Tyr Leu Tyr Gln Phe Val Gly Asn Trp Gly Trp Ala Ile Ile Leu Leu 325 330 335 ACG ATT ATC GTG CGC ATC ATC CTT TAT CCT TTA AGC TAT AAG GGC ATG 1277 Thr Ile Ile Val Arg Ile Ile Leu Tyr Pro Leu Ser Tyr Lys Gly Met 340 345 350 GTG AGC ATG CAA AAG CTC AAA GAA TTA GCC CCT AAA ATG AAA GAA CTC 1325 Val Ser Met Gln Lys Leu Lys Glu Leu Ala Pro Lys Met Lys Glu Leu 355 360 365 CAA GAA AAA TAC AAG GGC GAA CCC CAA AAA TTG CAA GCC CAC ATG ATG 1373 Gln Glu Lys Tyr Lys Gly Glu Pro Gln Lys Leu Gln Ala His Met Met 370 375 380 385 CAG CTT TAC AAA AAA CAT GGG GCT AAC CCA CTA GGG GGT TGT CTG CCC 1421 Gln Leu Tyr Lys Lys His Gly Ala Asn Pro Leu Gly Gly Cys Leu Pro 390 395 400 TTA ATC TTA CAA ATC CCG GTG TTT TTT GCC ATT TAT AGA GTG CTT TAT 1469 Leu Ile Leu Gln Ile Pro Val Phe Phe Ala Ile Tyr Arg Val Leu Tyr 405 410 415 AAC GCT GTG GAA TTG AAA AGC TCA GAG TGG ATC TTA TGG ATT CAT GAT 1517 Asn Ala Val Glu Leu Lys Ser Ser Glu Trp Ile Leu Trp Ile His Asp 420 425 430 TTA TCC ATC ATG GAT CCG TAT TTT ATT TTA CCG CTT CTT ATG GGA GCG 1565 Leu Ser Ile Met Asp Pro Tyr Phe Ile Leu Pro Leu Leu Met Gly Ala 435 440 445 TCT ATG TAT TGG CAC CAA AGC GTT ACG CCA AAC ACC ATG ACC GAT CCC 1613 Ser Met Tyr Trp His Gln Ser Val Thr Pro Asn Thr Met Thr Asp Pro 450 455 460 465 ATG CAA GCA AAG ATT TTT AAA CTC TTA CCC CTA TTA TTC ACA ATC TTT 1661 Met Gln Ala Lys Ile Phe Lys Leu Leu Pro Leu Leu Phe Thr Ile Phe 470 475 480 TTA ATC ACT TTC CCG GCA GGG TTA GTC TTG TAT TGG ACC ACG AAC AAC 1709 Leu Ile Thr Phe Pro Ala Gly Leu Val Leu Tyr Trp Thr Thr Asn Asn 485 490 495 ATC CTT TCG GTG TTG CAA CAA CTC ATC ATC AAT AAA GTC TTA GAG AAT 1757 Ile Leu Ser Val Leu Gln Gln Leu Ile Ile Asn Lys Val Leu Glu Asn 500 505 510 AAA AAA CGC ATG CAT GCG CAA AAC AAA AAG GAA CAT TGATGCAAAA TTTTAT 1809 Lys Lys Arg Met His Ala Gln Asn Lys Lys Glu His 515 520 525 TGAAATCAAA GCCAAAACCT TAGAAGAAGC CCTCATTCAA GCTTCTATCG CCTTGAATTG 1869 CCCCATTATT AATTTGCAAT ACGAAGTCAT TCAAACGCCC TCTAAAGGGT TTTTAAGCAT 1929 TGGTAAAAAA GAAGCCATTA TCTTAGCGGG CGTTAAAGA 1968 547 amino acids amino acid single linear protein internal Signal Sequence 1...22 2 Met Asp Lys Asn Asn Asn Asn Leu Arg Leu Ile Leu Ala Ile Ala Leu -20 -15 -10 Ser Phe Leu Phe Ile Ala Leu Tyr Ser Tyr Phe Phe Gln Lys Pro Asn -5 1 5 10 Lys Thr Thr Thr Gln Thr Thr Lys Gln Glu Thr Thr Asn Asn His Thr 15 20 25 Ala Thr Ser Pro Asn Ala Pro Asn Ala Gln His Phe Ser Thr Thr Gln 30 35 40 Thr Thr Pro Gln Glu Asn Leu Leu Ser Thr Ile Ser Phe Glu His Ala 45 50 55 Arg Ile Glu Ile Asp Ser Leu Gly Arg Ile Lys Gln Val Tyr Leu Lys 60 65 70 Asp Lys Lys Tyr Leu Thr Pro Lys Gln Lys Gly Phe Leu Glu His Val 75 80 85 90 Gly His Leu Phe Ser Ser Lys Glu Asn Ala Gln Pro Pro Leu Lys Glu 95 100 105 Leu Pro Leu Leu Ala Ala Asp Lys Leu Lys Pro Leu Glu Val Arg Phe 110 115 120 Leu Asp Pro Thr Leu Asn Asn Lys Ala Phe Asn Thr Pro Tyr Ser Ala 125 130 135 Ser Lys Thr Thr Leu Gly Pro Asn Glu Gln Leu Val Leu Thr Gln Asp 140 145 150 Leu Gly Thr Leu Ser Ile Ile Lys Thr Leu Thr Phe Tyr Asp Asp Leu 155 160 165 170 His Tyr Asp Leu Lys Ile Ala Phe Lys Ser Pro Asn Asn Leu Ile Pro 175 180 185 Ser Tyr Val Ile Thr Asn Gly Tyr Arg Pro Val Ala Asp Leu Asp Ser 190 195 200 Tyr Thr Phe Ser Gly Val Leu Leu Glu Asn Ser Asp Lys Lys Ile Glu 205 210 215 Lys Ile Glu Asp Lys Asp Ala Lys Glu Ile Lys Arg Phe Ser Asn Thr 220 225 230 Leu Phe Leu Ser Ser Val Asp Arg Tyr Phe Thr Thr Leu Leu Phe Thr 235 240 245 250 Lys Asp Pro Gln Gly Phe Glu Ala Leu Ile Asp Ser Glu Ile Gly Thr 255 260 265 Lys Asn Pro Leu Gly Phe Ile Ser Leu Lys Asn Glu Ala Asn Leu His 270 275 280 Gly Tyr Ile Gly Pro Lys Asp Tyr Arg Ser Leu Lys Ala Ile Ser Pro 285 290 295 Met Leu Thr Asp Val Ile Glu Tyr Gly Leu Ile Thr Phe Phe Ala Lys 300 305 310 Gly Val Phe Val Leu Leu Asp Tyr Leu Tyr Gln Phe Val Gly Asn Trp 315 320 325 330 Gly Trp Ala Ile Ile Leu Leu Thr Ile Ile Val Arg Ile Ile Leu Tyr 335 340 345 Pro Leu Ser Tyr Lys Gly Met Val Ser Met Gln Lys Leu Lys Glu Leu 350 355 360 Ala Pro Lys Met Lys Glu Leu Gln Glu Lys Tyr Lys Gly Glu Pro Gln 365 370 375 Lys Leu Gln Ala His Met Met Gln Leu Tyr Lys Lys His Gly Ala Asn 380 385 390 Pro Leu Gly Gly Cys Leu Pro Leu Ile Leu Gln Ile Pro Val Phe Phe 395 400 405 410 Ala Ile Tyr Arg Val Leu Tyr Asn Ala Val Glu Leu Lys Ser Ser Glu 415 420 425 Trp Ile Leu Trp Ile His Asp Leu Ser Ile Met Asp Pro Tyr Phe Ile 430 435 440 Leu Pro Leu Leu Met Gly Ala Ser Met Tyr Trp His Gln Ser Val Thr 445 450 455 Pro Asn Thr Met Thr Asp Pro Met Gln Ala Lys Ile Phe Lys Leu Leu 460 465 470 Pro Leu Leu Phe Thr Ile Phe Leu Ile Thr Phe Pro Ala Gly Leu Val 475 480 485 490 Leu Tyr Trp Thr Thr Asn Asn Ile Leu Ser Val Leu Gln Gln Leu Ile 495 500 505 Ile Asn Lys Val Leu Glu Asn Lys Lys Arg Met His Ala Gln Asn Lys 510 515 520 Lys Glu His 525 1851 base pairs nucleic acid single linear Coding Sequence 238...1665 (A) NAME/KEY Signal Sequence (B) LOCATION 238...313 (D) OTHER INFORMATION 3 GAGCTAGTTT TAAAAAGTTA GTTTTGTTTT AAAAAGTTAA TACTATTTTG AAGCACTCCT 60 ATTCAGATGG CTAAGGCACA CAAGAAATTA GGGGACTCTG CTGTATTCCT ACCCTGAAGC 120 GTTACCCTAA AATCCTATTG CATAGGTCTA AATAAGAGCT TAGGGATCAT TTTAGCCATA 180 AAAAGCTTAT GTTTTCATTA AAAATGTTAT GATACGCTCA AATAGTCAAG CAAAAAA ATG 240 Met -25 TCA ATT AAA AGG GTT AGA TTG AAA ATA TTC GTT CTG TTG ATG TCG GTA 288 Ser Ile Lys Arg Val Arg Leu Lys Ile Phe Val Leu Leu Met Ser Val -20 -15 -10 ATT TTA GGA ATA TCA TTA ACA GGT TGC ATA GGC TAT CGT ATG GAC TTA 336 Ile Leu Gly Ile Ser Leu Thr Gly Cys Ile Gly Tyr Arg Met Asp Leu -5 1 5 GAA CAT TTT AAC ACG CTC TAT TAT GAA GAA AGC CCT AAA AAA GCT TAT 384 Glu His Phe Asn Thr Leu Tyr Tyr Glu Glu Ser Pro Lys Lys Ala Tyr 10 15 20 GAA TAT TCC AAA CAA TTC ACT AAG AAA AAA AAG AAC GCT CTT TTA TGG 432 Glu Tyr Ser Lys Gln Phe Thr Lys Lys Lys Lys Asn Ala Leu Leu Trp 25 30 35 40 GAC TTG CAA AAC GGC TTG AGC GCT TTA TAC GCC AGA GAT TAC CAG ACT 480 Asp Leu Gln Asn Gly Leu Ser Ala Leu Tyr Ala Arg Asp Tyr Gln Thr 45 50 55 TCT TTA GGG GTA TTA GAT CAA GCC GAG CAA CGC TTT GAT AAA ACG CAA 528 Ser Leu Gly Val Leu Asp Gln Ala Glu Gln Arg Phe Asp Lys Thr Gln 60 65 70 AGC GCT TTT ACA AGA GGG GCT GGT TAT GTG GGC GCT ACC ATG ATT AAT 576 Ser Ala Phe Thr Arg Gly Ala Gly Tyr Val Gly Ala Thr Met Ile Asn 75 80 85 GAT AAT GTG CGC GCT TAT GGG GGG AAT ATT TAT GAG GGC GTT TTA ATC 624 Asp Asn Val Arg Ala Tyr Gly Gly Asn Ile Tyr Glu Gly Val Leu Ile 90 95 100 AAT TAT TAC AAA GCG ATA GAC TAC ATG CTT TTA AAC GAT AGC GCG AAA 672 Asn Tyr Tyr Lys Ala Ile Asp Tyr Met Leu Leu Asn Asp Ser Ala Lys 105 110 115 120 GCT AGG GTG CAA TTC AAC CGT GCG AAC GAA CGC CAG CGC AGG GCT AAA 720 Ala Arg Val Gln Phe Asn Arg Ala Asn Glu Arg Gln Arg Arg Ala Lys 125 130 135 GAA TTT TAT TAT GAG GAA GTG CAA AAA GCC ATT AAA GAG ATC GAT TCT 768 Glu Phe Tyr Tyr Glu Glu Val Gln Lys Ala Ile Lys Glu Ile Asp Ser 140 145 150 AGC AAA AAG CAC AAT ATT AAT ATG GAA CGC TCT AGG GTG GAA GTG AGC 816 Ser Lys Lys His Asn Ile Asn Met Glu Arg Ser Arg Val Glu Val Ser 155 160 165 GAG ATT TTA AAC AAC ACC TAT TCT AAT TTA GAC AAA TAC GAA GCT TAT 864 Glu Ile Leu Asn Asn Thr Tyr Ser Asn Leu Asp Lys Tyr Glu Ala Tyr 170 175 180 CAG GGC TTA CTT AAC CCG GCG GTT TCG TAT CTC TCA GGG TTG TTT TAC 912 Gln Gly Leu Leu Asn Pro Ala Val Ser Tyr Leu Ser Gly Leu Phe Tyr 185 190 195 200 GCT TTA AAT GGG GAT GAG AAT AAG GGA TTA GGC TAT CTT AAT GAA GCC 960 Ala Leu Asn Gly Asp Glu Asn Lys Gly Leu Gly Tyr Leu Asn Glu Ala 205 210 215 TAT GGG ATC AGT CAA AGC CCT TTT GTA GCC CAA GAC TTG GTT TTT TTC 1008 Tyr Gly Ile Ser Gln Ser Pro Phe Val Ala Gln Asp Leu Val Phe Phe 220 225 230 AAA AAC CCT AAC AGG AGC CAT TTC ACT TGG ATC ATC ATT GAA GAT GGT 1056 Lys Asn Pro Asn Arg Ser His Phe Thr Trp Ile Ile Ile Glu Asp Gly 235 240 245 AAA GAG CCG CAA AAA AGC GAA TTT AAA ATT GAT GTG CCT ATT TTT ATG 1104 Lys Glu Pro Gln Lys Ser Glu Phe Lys Ile Asp Val Pro Ile Phe Met 250 255 260 ATC GAT TCG GTT TAT AAC GTG AGT ATA GCC TTG CCC AAG CTA GAA AAA 1152 Ile Asp Ser Val Tyr Asn Val Ser Ile Ala Leu Pro Lys Leu Glu Lys 265 270 275 280 GGG GAA GCG TTT TAT CAA AAT TTC ACT CTC AAA GAT GGA GAA AAA GTA 1200 Gly Glu Ala Phe Tyr Gln Asn Phe Thr Leu Lys Asp Gly Glu Lys Val 285 290 295 ACG CCC TTT GAC ACT TTA GCC TCA ATA GAT GCG GTG GTC GCT AGC GAA 1248 Thr Pro Phe Asp Thr Leu Ala Ser Ile Asp Ala Val Val Ala Ser Glu 300 305 310 TTC AGG AAG CAG TTG CCC TAC ATT ATC ACT AGG GCT ATT TTA TCG GCC 1296 Phe Arg Lys Gln Leu Pro Tyr Ile Ile Thr Arg Ala Ile Leu Ser Ala 315 320 325 ACT TTT AAA GTG GGC ATG CAA GCG GTG GCG AAC TAT TAT TTG GGG TTT 1344 Thr Phe Lys Val Gly Met Gln Ala Val Ala Asn Tyr Tyr Leu Gly Phe 330 335 340 GTT GGA GGG TTA GTA ACT TCC TTG TAT TCA GGT GTG AGC ACC TTT GCA 1392 Val Gly Gly Leu Val Thr Ser Leu Tyr Ser Gly Val Ser Thr Phe Ala 345 350 355 360 GAC ACT AGA AGC ACG AGC ATT TTT GCC CAT AAA ATC TAC CTC ATG CGC 1440 Asp Thr Arg Ser Thr Ser Ile Phe Ala His Lys Ile Tyr Leu Met Arg 365 370 375 ATT AAA AAC AAA GCC TTT GAA AGT TAT GAA GTT CGA GCC GAT TCC ATT 1488 Ile Lys Asn Lys Ala Phe Glu Ser Tyr Glu Val Arg Ala Asp Ser Ile 380 385 390 GAC GCT TTT TCG TTT TCA TTA AAG CCT TGT AAA AGA TCG CTT GAA AGC 1536 Asp Ala Phe Ser Phe Ser Leu Lys Pro Cys Lys Arg Ser Leu Glu Ser 395 400 405 CCT AAA ATC ATT GAC GCT AGG GAA TTG CTT TCT GGG TTT GTA GCA GCC 1584 Pro Lys Ile Ile Asp Ala Arg Glu Leu Leu Ser Gly Phe Val Ala Ala 410 415 420 CCA CAA ATC TTT TGC TCT AAC CGC CAT AAT ATT TTA TAC GTG CGC AGT 1632 Pro Gln Ile Phe Cys Ser Asn Arg His Asn Ile Leu Tyr Val Arg Ser 425 430 435 440 TTT AAA AAC GGG TTT GTT TTG AGT CGT TTA AAA TGATTTCAAA ACCCCCACCA 1685 Phe Lys Asn Gly Phe Val Leu Ser Arg Leu Lys 445 450 AAGGAATTTT AGTTTTTAAG TGTCGTTGGC ATTAAACGCA AACACGATAT AATTATAAAA 1745 CGATACGAAA ACCTAAATTA AGGGGAAGTC ATGGCTGATA GTTTAGCGGG CATTGATCAA 1805 GTTACGAGTT TGCATAAAAA TAACGAGTTA CAATTGTTGT GTTTCA 1851 476 amino acids amino acid single linear protein internal Signal Sequence 1...25 4 Met Ser Ile Lys Arg Val Arg Leu Lys Ile Phe Val Leu Leu Met Ser -25 -20 -15 -10 Val Ile Leu Gly Ile Ser Leu Thr Gly Cys Ile Gly Tyr Arg Met Asp -5 1 5 Leu Glu His Phe Asn Thr Leu Tyr Tyr Glu Glu Ser Pro Lys Lys Ala 10 15 20 Tyr Glu Tyr Ser Lys Gln Phe Thr Lys Lys Lys Lys Asn Ala Leu Leu 25 30 35 Trp Asp Leu Gln Asn Gly Leu Ser Ala Leu Tyr Ala Arg Asp Tyr Gln 40 45 50 55 Thr Ser Leu Gly Val Leu Asp Gln Ala Glu Gln Arg Phe Asp Lys Thr 60 65 70 Gln Ser Ala Phe Thr Arg Gly Ala Gly Tyr Val Gly Ala Thr Met Ile 75 80 85 Asn Asp Asn Val Arg Ala Tyr Gly Gly Asn Ile Tyr Glu Gly Val Leu 90 95 100 Ile Asn Tyr Tyr Lys Ala Ile Asp Tyr Met Leu Leu Asn Asp Ser Ala 105 110 115 Lys Ala Arg Val Gln Phe Asn Arg Ala Asn Glu Arg Gln Arg Arg Ala 120 125 130 135 Lys Glu Phe Tyr Tyr Glu Glu Val Gln Lys Ala Ile Lys Glu Ile Asp 140 145 150 Ser Ser Lys Lys His Asn Ile Asn Met Glu Arg Ser Arg Val Glu Val 155 160 165 Ser Glu Ile Leu Asn Asn Thr Tyr Ser Asn Leu Asp Lys Tyr Glu Ala 170 175 180 Tyr Gln Gly Leu Leu Asn Pro Ala Val Ser Tyr Leu Ser Gly Leu Phe 185 190 195 Tyr Ala Leu Asn Gly Asp Glu Asn Lys Gly Leu Gly Tyr Leu Asn Glu 200 205 210 215 Ala Tyr Gly Ile Ser Gln Ser Pro Phe Val Ala Gln Asp Leu Val Phe 220 225 230 Phe Lys Asn Pro Asn Arg Ser His Phe Thr Trp Ile Ile Ile Glu Asp 235 240 245 Gly Lys Glu Pro Gln Lys Ser Glu Phe Lys Ile Asp Val Pro Ile Phe 250 255 260 Met Ile Asp Ser Val Tyr Asn Val Ser Ile Ala Leu Pro Lys Leu Glu 265 270 275 Lys Gly Glu Ala Phe Tyr Gln Asn Phe Thr Leu Lys Asp Gly Glu Lys 280 285 290 295 Val Thr Pro Phe Asp Thr Leu Ala Ser Ile Asp Ala Val Val Ala Ser 300 305 310 Glu Phe Arg Lys Gln Leu Pro Tyr Ile Ile Thr Arg Ala Ile Leu Ser 315 320 325 Ala Thr Phe Lys Val Gly Met Gln Ala Val Ala Asn Tyr Tyr Leu Gly 330 335 340 Phe Val Gly Gly Leu Val Thr Ser Leu Tyr Ser Gly Val Ser Thr Phe 345 350 355 Ala Asp Thr Arg Ser Thr Ser Ile Phe Ala His Lys Ile Tyr Leu Met 360 365 370 375 Arg Ile Lys Asn Lys Ala Phe Glu Ser Tyr Glu Val Arg Ala Asp Ser 380 385 390 Ile Asp Ala Phe Ser Phe Ser Leu Lys Pro Cys Lys Arg Ser Leu Glu 395 400 405 Ser Pro Lys Ile Ile Asp Ala Arg Glu Leu Leu Ser Gly Phe Val Ala 410 415 420 Ala Pro Gln Ile Phe Cys Ser Asn Arg His Asn Ile Leu Tyr Val Arg 425 430 435 Ser Phe Lys Asn Gly Phe Val Leu Ser Arg Leu Lys 440 445 450 898 base pairs nucleic acid single linear Coding Sequence 86...835 (A) NAME/KEY Signal Sequence (B) LOCATION 86...161 (D) OTHER INFORMATION 5 GCATAAAATA AACAAACATT AAGTAAGGCT TATCAATATT TGATTACAAT TATAAGGGTT 60 ACATTTTTTT AATAGGAGAT ATACC ATG CTA GGA AAC GTT AAA AAA ACC CTT 112 Met Leu Gly Asn Val Lys Lys Thr Leu -25 -20 TTT GGG GTC TTG TGT TTG GGC ACG TTG TGT TTG AGA GGG TTA ATG GCA 160 Phe Gly Val Leu Cys Leu Gly Thr Leu Cys Leu Arg Gly Leu Met Ala -15 -10 -5 GAG CCA GAC GCT AAA GAG CTT GTT AAT TTA GGC ATA GAG AGC GCG AAG 208 Glu Pro Asp Ala Lys Glu Leu Val Asn Leu Gly Ile Glu Ser Ala Lys 1 5 10 15 AAG CAA GAT TTC GCT CAA GCT AAA ACG CAT TTT GAA AAA GCT TGT GAG 256 Lys Gln Asp Phe Ala Gln Ala Lys Thr His Phe Glu Lys Ala Cys Glu 20 25 30 TTA AAA AAT GGC TTT GGA TGT GTT TTT TTA GGG GCG TTC TAT GAA GAA 304 Leu Lys Asn Gly Phe Gly Cys Val Phe Leu Gly Ala Phe Tyr Glu Glu 35 40 45 GGG AAA GGA GTG GGA AAA GAC TTG AAA AAA GCC ATC CAA TTT TAC ACT 352 Gly Lys Gly Val Gly Lys Asp Leu Lys Lys Ala Ile Gln Phe Tyr Thr 50 55 60 AAA GGT TGT GAA TTA AAT GAT GGT TAT GGG TGT AAC CTG CTA GGA AAT 400 Lys Gly Cys Glu Leu Asn Asp Gly Tyr Gly Cys Asn Leu Leu Gly Asn 65 70 75 80 TTA TAC TAT AAC GGA CAA GGC GTG TCA AAA GAC GCT AAA AAA GCC TCA 448 Leu Tyr Tyr Asn Gly Gln Gly Val Ser Lys Asp Ala Lys Lys Ala Ser 85 90 95 CAA TAC TAC TCT AAA GCT TGC GAC TTA AAC CAT GCT GAA GGG TGT ATG 496 Gln Tyr Tyr Ser Lys Ala Cys Asp Leu Asn His Ala Glu Gly Cys Met 100 105 110 GTA TTA GGA AGC TTA CAC CAT TAT GGC GTA GGC ACG CCT AAG GAT TTA 544 Val Leu Gly Ser Leu His His Tyr Gly Val Gly Thr Pro Lys Asp Leu 115 120 125 AGA AAG GCT CTT GAT TTG TAT GAA AAA GCT TGC GAT TTA AAA GAC AGC 592 Arg Lys Ala Leu Asp Leu Tyr Glu Lys Ala Cys Asp Leu Lys Asp Ser 130 135 140 CCA GGG TGT ATT AAT GCA GGA TAT ATA TAT AGT GTA ACA AAG AAT TTT 640 Pro Gly Cys Ile Asn Ala Gly Tyr Ile Tyr Ser Val Thr Lys Asn Phe 145 150 155 160 AAG GAG GCT ATC GTT CGT TAT TCT AAA GCA TGC GAA TTA AAA GAT GGT 688 Lys Glu Ala Ile Val Arg Tyr Ser Lys Ala Cys Glu Leu Lys Asp Gly 165 170 175 AGG GGG TGT TAT AAT TTA GGG GTT ATG CAA TAC AAC GCT CAA GGT ACA 736 Arg Gly Cys Tyr Asn Leu Gly Val Met Gln Tyr Asn Ala Gln Gly Thr 180 185 190 GCA AAG GAC GAA AAG CAA GCG GTA GAA AAC TTT AAA AAA GGC TGC AAA 784 Ala Lys Asp Glu Lys Gln Ala Val Glu Asn Phe Lys Lys Gly Cys Lys 195 200 205 TCA AGC GTT AAA GAA GCA TGC GAC GCT CTC AAG GAA TTA AAA ATA GAA 832 Ser Ser Val Lys Glu Ala Cys Asp Ala Leu Lys Glu Leu Lys Ile Glu 210 215 220 CTT TAATTTCAAT GAAGTTAGCT AAACGCTGCG TTTAGCTGGC TTTTACGCTT TTTATA 891 Leu 225 TTTTAAG 898 250 amino acids amino acid single linear protein internal Signal Sequence 1...25 6 Met Leu Gly Asn Val Lys Lys Thr Leu Phe Gly Val Leu Cys Leu Gly -25 -20 -15 -10 Thr Leu Cys Leu Arg Gly Leu Met Ala Glu Pro Asp Ala Lys Glu Leu -5 1 5 Val Asn Leu Gly Ile Glu Ser Ala Lys Lys Gln Asp Phe Ala Gln Ala 10 15 20 Lys Thr His Phe Glu Lys Ala Cys Glu Leu Lys Asn Gly Phe Gly Cys 25 30 35 Val Phe Leu Gly Ala Phe Tyr Glu Glu Gly Lys Gly Val Gly Lys Asp 40 45 50 55 Leu Lys Lys Ala Ile Gln Phe Tyr Thr Lys Gly Cys Glu Leu Asn Asp 60 65 70 Gly Tyr Gly Cys Asn Leu Leu Gly Asn Leu Tyr Tyr Asn Gly Gln Gly 75 80 85 Val Ser Lys Asp Ala Lys Lys Ala Ser Gln Tyr Tyr Ser Lys Ala Cys 90 95 100 Asp Leu Asn His Ala Glu Gly Cys Met Val Leu Gly Ser Leu His His 105 110 115 Tyr Gly Val Gly Thr Pro Lys Asp Leu Arg Lys Ala Leu Asp Leu Tyr 120 125 130 135 Glu Lys Ala Cys Asp Leu Lys Asp Ser Pro Gly Cys Ile Asn Ala Gly 140 145 150 Tyr Ile Tyr Ser Val Thr Lys Asn Phe Lys Glu Ala Ile Val Arg Tyr 155 160 165 Ser Lys Ala Cys Glu Leu Lys Asp Gly Arg Gly Cys Tyr Asn Leu Gly 170 175 180 Val Met Gln Tyr Asn Ala Gln Gly Thr Ala Lys Asp Glu Lys Gln Ala 185 190 195 Val Glu Asn Phe Lys Lys Gly Cys Lys Ser Ser Val Lys Glu Ala Cys 200 205 210 215 Asp Ala Leu Lys Glu Leu Lys Ile Glu Leu 220 225 1422 base pairs nucleic acid single linear Coding Sequence 216...1202 (A) NAME/KEY Signal Sequence (B) LOCATION 216...273 (D) OTHER INFORMATION 7 AAATTAAACG AGTTTGGTTT AGAGCCGTAT TTAGGGTTTT TGCACCCCCA TTTAACCAAT 60 GATTTTGAAA ATAACCCTAA TGAGCAATCA GCGCTCTTTG TCTTGCCCCT TTCAGCGGTT 120 AGCGCTCTTA ATGTGCATGC ACTCAAATTT GTGTTGTTGG AAGCGTTACC CTAAAACGCT 180 ATTTTTAAAA TAATCCATTA AAATAAAGGC GAGGA ATG AAA AGA TTT GTT TTG 233 Met Lys Arg Phe Val Leu -15 TTT TTA TTG TTC ATG TGC GTT TGC GTT CAA GCT TAC GCC GAG CAA GAT 281 Phe Leu Leu Phe Met Cys Val Cys Val Gln Ala Tyr Ala Glu Gln Asp -10 -5 1 TAC TTT TTT AGG GAT TTT AAA TCT AGA GAT TTG CCC CAA AAA CTC CAT 329 Tyr Phe Phe Arg Asp Phe Lys Ser Arg Asp Leu Pro Gln Lys Leu His 5 10 15 CTT GAT AAA AAG CTC TCC CAA ACA ATA CAG CCA TGC ATG CAA CTT AAC 377 Leu Asp Lys Lys Leu Ser Gln Thr Ile Gln Pro Cys Met Gln Leu Asn 20 25 30 35 GCA TCA AAA CAC TAC ACT TCT ACC GGG GTT AGA GAG CCT GAT AAA TGC 425 Ala Ser Lys His Tyr Thr Ser Thr Gly Val Arg Glu Pro Asp Lys Cys 40 45 50 ACA AAG AGT TTT AAA AAA TCC GCT CTC ATG TCC TAT GAC TTA GCG CTA 473 Thr Lys Ser Phe Lys Lys Ser Ala Leu Met Ser Tyr Asp Leu Ala Leu 55 60 65 GGT TAT TTG GTG AGT AAG AAT AAG CAA TAC GGC TTA AAG GCT ATA GAA 521 Gly Tyr Leu Val Ser Lys Asn Lys Gln Tyr Gly Leu Lys Ala Ile Glu 70 75 80 ATT TTA AAC GCT TGG GCT AAA GAG CTT CAA AGC GTG GAT ACT TAT CAG 569 Ile Leu Asn Ala Trp Ala Lys Glu Leu Gln Ser Val Asp Thr Tyr Gln 85 90 95 AGC GAG GAT AAT ATC AAT TTT TAC ATG CCT TAT ATG AAC ATG GCT TAT 617 Ser Glu Asp Asn Ile Asn Phe Tyr Met Pro Tyr Met Asn Met Ala Tyr 100 105 110 115 TGG TTT GTC AAA AAG GCG TTT CCT AGC CCA GAA TAT GAA GAT TTC ATT 665 Trp Phe Val Lys Lys Ala Phe Pro Ser Pro Glu Tyr Glu Asp Phe Ile 120 125 130 AAG CGG ATG CGC CAG TAT TCT CAA TCA GCT CTT AAC ACT AAC CAT GGG 713 Lys Arg Met Arg Gln Tyr Ser Gln Ser Ala Leu Asn Thr Asn His Gly 135 140 145 GCG TGG GGC ATT CTT TTT GAT GTG AGT TCT GCG CTA GCG TTA GAC GAT 761 Ala Trp Gly Ile Leu Phe Asp Val Ser Ser Ala Leu Ala Leu Asp Asp 150 155 160 AAT GCC CTT TTG CAC AAT AGC GCT AAT CGG TGG CAG GAG TGG GTG TTT 809 Asn Ala Leu Leu His Asn Ser Ala Asn Arg Trp Gln Glu Trp Val Phe 165 170 175 AAA GCC ATA GAT GAG AAT GGG GTT ATT GNT AGC GCG ATC ACT AGG AGC 857 Lys Ala Ile Asp Glu Asn Gly Val Ile Xaa Ser Ala Ile Thr Arg Ser 180 185 190 195 GAT ACG AGC GAT TAT CAT GGC GGC CCT ACA AAG GGC ATT AAG GGG ATA 905 Asp Thr Ser Asp Tyr His Gly Gly Pro Thr Lys Gly Ile Lys Gly Ile 200 205 210 GCT TAT ACC AAT TTC GCG CTT CTT GCG CTA ACC ATA TCA GGC GAA TTG 953 Ala Tyr Thr Asn Phe Ala Leu Leu Ala Leu Thr Ile Ser Gly Glu Leu 215 220 225 CTT TTT GAG AAC GGG TAT GAT TTG TGG GGT AGT GGA GCT GGG AAA AGG 1001 Leu Phe Glu Asn Gly Tyr Asp Leu Trp Gly Ser Gly Ala Gly Lys Arg 230 235 240 CTC TCT GTG GCG TAT AAC AAA GTT GCA ACA TGG ATT TTA AAC CCT GAA 1049 Leu Ser Val Ala Tyr Asn Lys Val Ala Thr Trp Ile Leu Asn Pro Glu 245 250 255 ACT TTC CCT TAT TTC CAG CCT AAC CTT ATC GGG GTG CAT AAC AAC GCC 1097 Thr Phe Pro Tyr Phe Gln Pro Asn Leu Ile Gly Val His Asn Asn Ala 260 265 270 275 TAT TTC ATT ATT TTA GCC AAG CAT TAT TCT AGC CCT AGT GCA AAT GAG 1145 Tyr Phe Ile Ile Leu Ala Lys His Tyr Ser Ser Pro Ser Ala Asn Glu 280 285 290 CTT TTA AAG CAA GGC GAT TTA CAC GAA GAT GGT TTC AGG CTG AAA CTC 1193 Leu Leu Lys Gln Gly Asp Leu His Glu Asp Gly Phe Arg Leu Lys Leu 295 300 305 CGA TCG CCA TGAATTTTTC TGTATCCAAG GTTAGCCTTA AGGATGGCCA TGCGCTTTA 1251 Arg Ser Pro 310 ACCTTTTGAT GAATGGTTCA GAAAGTTTGT TTCAGTCAGC ATTATTTACA AAAAGAGTTT 1311 AAAATAAACG CAATTGTATC TCTTGAGTCG TCTTTAGAGT GCAAATGATT ATCAAAATGA 1371 ATCGTTTTAG TTGTAAGCGT GCTTATTTAC ACTAAAATAA TAAGCGTTAT T 1422 329 amino acids amino acid single linear protein internal Signal Sequence 1...19 8 Met Lys Arg Phe Val Leu Phe Leu Leu Phe Met Cys Val Cys Val Gln -15 -10 -5 Ala Tyr Ala Glu Gln Asp Tyr Phe Phe Arg Asp Phe Lys Ser Arg Asp 1 5 10 Leu Pro Gln Lys Leu His Leu Asp Lys Lys Leu Ser Gln Thr Ile Gln 15 20 25 Pro Cys Met Gln Leu Asn Ala Ser Lys His Tyr Thr Ser Thr Gly Val 30 35 40 45 Arg Glu Pro Asp Lys Cys Thr Lys Ser Phe Lys Lys Ser Ala Leu Met 50 55 60 Ser Tyr Asp Leu Ala Leu Gly Tyr Leu Val Ser Lys Asn Lys Gln Tyr 65 70 75 Gly Leu Lys Ala Ile Glu Ile Leu Asn Ala Trp Ala Lys Glu Leu Gln 80 85 90 Ser Val Asp Thr Tyr Gln Ser Glu Asp Asn Ile Asn Phe Tyr Met Pro 95 100 105 Tyr Met Asn Met Ala Tyr Trp Phe Val Lys Lys Ala Phe Pro Ser Pro 110 115 120 125 Glu Tyr Glu Asp Phe Ile Lys Arg Met Arg Gln Tyr Ser Gln Ser Ala 130 135 140 Leu Asn Thr Asn His Gly Ala Trp Gly Ile Leu Phe Asp Val Ser Ser 145 150 155 Ala Leu Ala Leu Asp Asp Asn Ala Leu Leu His Asn Ser Ala Asn Arg 160 165 170 Trp Gln Glu Trp Val Phe Lys Ala Ile Asp Glu Asn Gly Val Ile Xaa 175 180 185 Ser Ala Ile Thr Arg Ser Asp Thr Ser Asp Tyr His Gly Gly Pro Thr 190 195 200 205 Lys Gly Ile Lys Gly Ile Ala Tyr Thr Asn Phe Ala Leu Leu Ala Leu 210 215 220 Thr Ile Ser Gly Glu Leu Leu Phe Glu Asn Gly Tyr Asp Leu Trp Gly 225 230 235 Ser Gly Ala Gly Lys Arg Leu Ser Val Ala Tyr Asn Lys Val Ala Thr 240 245 250 Trp Ile Leu Asn Pro Glu Thr Phe Pro Tyr Phe Gln Pro Asn Leu Ile 255 260 265 Gly Val His Asn Asn Ala Tyr Phe Ile Ile Leu Ala Lys His Tyr Ser 270 275 280 285 Ser Pro Ser Ala Asn Glu Leu Leu Lys Gln Gly Asp Leu His Glu Asp 290 295 300 Gly Phe Arg Leu Lys Leu Arg Ser Pro 305 310 1079 base pairs nucleic acid single linear Coding Sequence 169...834 (A) NAME/KEY Signal Sequence (B) LOCATION 169...289 (D) OTHER INFORMATION 9 CAAAAAAAAA AAAAAACAAT TTCAGTTTCT TATTAGCTAG GTTTGATTAA AATGAAAAGC 60 TTTTATGTGT TTAAACTTCA TTGTCTTAAA ACTTTTAAGA GCAATTTTAA AATTCGTTGG 120 CGTATAATAT CCGTTTTGAA TGAACTACTA AAAAAAGGGT TTTAAATA ATG GCT GAA 177 Met Ala Glu -40 AAT TCT TTC AAA AAT GTT TCC ACA CAA CCC AAA GTA TTT TTC TTA TTG 225 Asn Ser Phe Lys Asn Val Ser Thr Gln Pro Lys Val Phe Phe Leu Leu -35 -30 -25 CCA GCT AAA ACC CTG TTT CTT TTA GGA GGC GTT TTT AGC GCG TTT TTT 273 Pro Ala Lys Thr Leu Phe Leu Leu Gly Gly Val Phe Ser Ala Phe Phe -20 -15 -10 ATC CTT ATT GCT GGC TTG GTT TTT TTT GAT TAT GCT CAT TTG ATG GAC 321 Ile Leu Ile Ala Gly Leu Val Phe Phe Asp Tyr Ala His Leu Met Asp -5 1 5 10 AAT GCC ATT TTT AAT TTT GCG CGT TCA ACC CCC TTT AAT TCC AGC CCT 369 Asn Ala Ile Phe Asn Phe Ala Arg Ser Thr Pro Phe Asn Ser Ser Pro 15 20 25 ATT TTA ACT CTA ATC CTC CAA AAT ATC GCT AAT TTA GGC TCT TCT CAA 417 Ile Leu Thr Leu Ile Leu Gln Asn Ile Ala Asn Leu Gly Ser Ser Gln 30 35 40 TTC GTG TTG CCT TTG AGT TTG TTG GTG GGG GTG TTT TTA AGC CTT TAT 465 Phe Val Leu Pro Leu Ser Leu Leu Val Gly Val Phe Leu Ser Leu Tyr 45 50 55 CGC AGA AAC TTA GTG CTT GGG GTG TGG TTT GTG TTA AGC GTG ATC TTG 513 Arg Arg Asn Leu Val Leu Gly Val Trp Phe Val Leu Ser Val Ile Leu 60 65 70 75 TTT GAA GCC CTT TTA GAA TCT TTA AAA CAC CTT TTT GCA TAT TCC ATT 561 Phe Glu Ala Leu Leu Glu Ser Leu Lys His Leu Phe Ala Tyr Ser Ile 80 85 90 CAG TGG CTT TCG CGC AGC GCT AAT TTC CCT AAC GCT ACT GCG CTT TCT 609 Gln Trp Leu Ser Arg Ser Ala Asn Phe Pro Asn Ala Thr Ala Leu Ser 95 100 105 TTA GTG CTA TTT TAT GGG TTG CTT ATT TTA TTG ATA CCC CAT TTA ATC 657 Leu Val Leu Phe Tyr Gly Leu Leu Ile Leu Leu Ile Pro His Leu Ile 110 115 120 ACG CAT CAA ACG CTT AAA AAT GTT CTT TTT TAT AGC TTA TTT GGT TTG 705 Thr His Gln Thr Leu Lys Asn Val Leu Phe Tyr Ser Leu Phe Gly Leu 125 130 135 ATT TTT TTA ATA GGG TTA GCA CTG ATT GTT TTA GGG GTT TCT TTC AGT 753 Ile Phe Leu Ile Gly Leu Ala Leu Ile Val Leu Gly Val Ser Phe Ser 140 145 150 155 AGT GTT TTA GGA GGG TTT TGT TTA GGG GCG TTA GGG GCT TGT TTT TCC 801 Ser Val Leu Gly Gly Phe Cys Leu Gly Ala Leu Gly Ala Cys Phe Ser 160 165 170 ATA GGG ATT TAT TTG AGC GTG TTT CAA AAG ATC TAAACGAACG GCTTAAAAGA 854 Ile Gly Ile Tyr Leu Ser Val Phe Gln Lys Ile 175 180 ATGAAAATTT TATCAAGGTT TTAATATTGG ATTTAAAGGT ATTATTGCAA CGGATTGTTG 914 ATTTTTTCAT CAAGCTCAAT AAAAAGCAAA AAATCGCCCT GATTGCAGCT GGGGTTTTGA 974 TCACGGCTTT GCTTGTGTTT TTATTGCTCT ATCCCTTTAA AGAAAAAGAC TACACGCAAG 1034 GGGGTTATGG GGTTTTATTT GAAGGTTTAG ACTCTAGCGA TAACG 1079 222 amino acids amino acid single linear protein internal Signal Sequence 1...40 10 Met Ala Glu Asn Ser Phe Lys Asn Val Ser Thr Gln Pro Lys Val Phe -40 -35 -30 -25 Phe Leu Leu Pro Ala Lys Thr Leu Phe Leu Leu Gly Gly Val Phe Ser -20 -15 -10 Ala Phe Phe Ile Leu Ile Ala Gly Leu Val Phe Phe Asp Tyr Ala His -5 1 5 Leu Met Asp Asn Ala Ile Phe Asn Phe Ala Arg Ser Thr Pro Phe Asn 10 15 20 Ser Ser Pro Ile Leu Thr Leu Ile Leu Gln Asn Ile Ala Asn Leu Gly 25 30 35 40 Ser Ser Gln Phe Val Leu Pro Leu Ser Leu Leu Val Gly Val Phe Leu 45 50 55 Ser Leu Tyr Arg Arg Asn Leu Val Leu Gly Val Trp Phe Val Leu Ser 60 65 70 Val Ile Leu Phe Glu Ala Leu Leu Glu Ser Leu Lys His Leu Phe Ala 75 80 85 Tyr Ser Ile Gln Trp Leu Ser Arg Ser Ala Asn Phe Pro Asn Ala Thr 90 95 100 Ala Leu Ser Leu Val Leu Phe Tyr Gly Leu Leu Ile Leu Leu Ile Pro 105 110 115 120 His Leu Ile Thr His Gln Thr Leu Lys Asn Val Leu Phe Tyr Ser Leu 125 130 135 Phe Gly Leu Ile Phe Leu Ile Gly Leu Ala Leu Ile Val Leu Gly Val 140 145 150 Ser Phe Ser Ser Val Leu Gly Gly Phe Cys Leu Gly Ala Leu Gly Ala 155 160 165 Cys Phe Ser Ile Gly Ile Tyr Leu Ser Val Phe Gln Lys Ile 170 175 180 2234 base pairs nucleic acid single linear Coding Sequence 213...2081 (A) NAME/KEY Signal Sequence (B) LOCATION 213...273 (D) OTHER INFORMATION 11 ATCATAAAAT GTAAAAATAC TCAAAGCATC GCATCAAGCA ATATAGCGAT CTGAAAAGAG 60 GCTCACAATT GAGCTAAAGC CCGCTTTTTA GGGATAAATA AAAAGCGTTT TCAAATTGCA 120 TGGGTAACTT TATGGGGCGA AGCGTTTCTA AATTTTGGTA TAATCGCTAG AAATTGTGAG 180 AAAGATTCTA TCTTGTTTGA GTGGGGTTTC GC ATG CGT TTA TTA TTG TGG TGG 233 Met Arg Leu Leu Leu Trp Trp -20 -15 GTA TTG GTA TTA TCG CTC TTT TTA AAT CCT TTG AGA GCG GTT GAA GAG 281 Val Leu Val Leu Ser Leu Phe Leu Asn Pro Leu Arg Ala Val Glu Glu -10 -5 1 CAT GAA ACA GAT GCG GTG GAT TTG TTT TTG ATT TTC AAT CAA ATC AAC 329 His Glu Thr Asp Ala Val Asp Leu Phe Leu Ile Phe Asn Gln Ile Asn 5 10 15 CAG CTC AAT CAA GTC ATT GAA ACT TAC AAA AAA AAC CCT GAA AGA AGC 377 Gln Leu Asn Gln Val Ile Glu Thr Tyr Lys Lys Asn Pro Glu Arg Ser 20 25 30 35 GCT GAA ATC TCT CTG TAT AAC ACC CAA AAG AAT GAC TTG ATT AAA AGT 425 Ala Glu Ile Ser Leu Tyr Asn Thr Gln Lys Asn Asp Leu Ile Lys Ser 40 45 50 TTG ACT TCT AAA GTG TTG AAT GAA AGG GAT AAG ATC GGG ATT GAT ATC 473 Leu Thr Ser Lys Val Leu Asn Glu Arg Asp Lys Ile Gly Ile Asp Ile 55 60 65 AAT CAA AAT TTA AAA GAG CAG GAA AAA ATC AAA AAG CGT TTG TCT AAA 521 Asn Gln Asn Leu Lys Glu Gln Glu Lys Ile Lys Lys Arg Leu Ser Lys 70 75 80 AGC ATT AAT GGC GAT GAT TTC TAC ACT TTC ATG AAA GAC AGA TTG TCT 569 Ser Ile Asn Gly Asp Asp Phe Tyr Thr Phe Met Lys Asp Arg Leu Ser 85 90 95 TTA GAT ATT TTG TTG ATA GAT GAA ATT TTG TAT CGT TTT ATA GAT AAA 617 Leu Asp Ile Leu Leu Ile Asp Glu Ile Leu Tyr Arg Phe Ile Asp Lys 100 105 110 115 ATC AGG AGC AGT ATT GAT ATT TTT AGC GAA CAA AAA GAT GTA GAA AGC 665 Ile Arg Ser Ser Ile Asp Ile Phe Ser Glu Gln Lys Asp Val Glu Ser 120 125 130 ATC AGC GAT GCT TTC CTT TTG CGT TTA GGG CAA TTC AAA CTC TAC ACT 713 Ile Ser Asp Ala Phe Leu Leu Arg Leu Gly Gln Phe Lys Leu Tyr Thr 135 140 145 TTC CCT AAA AAT TTA GGC AAT GTC AAA ATG CAT GAA TTA GAG CAG ATG 761 Phe Pro Lys Asn Leu Gly Asn Val Lys Met His Glu Leu Glu Gln Met 150 155 160 TTT AGC GAT TAT GAA TTG CGT TTG AAC ACT TAC ACC GAA GTC TTG CGT 809 Phe Ser Asp Tyr Glu Leu Arg Leu Asn Thr Tyr Thr Glu Val Leu Arg 165 170 175 TAC ATT AAA AAC CAC CCT AAA GAA GTG CTT CCT AAA AAC TTG ATC ATG 857 Tyr Ile Lys Asn His Pro Lys Glu Val Leu Pro Lys Asn Leu Ile Met 180 185 190 195 GAA GTG AAT ATG GAT TTT GTG TTA AAC AAA ATC AGC AAG GTT TTG CCT 905 Glu Val Asn Met Asp Phe Val Leu Asn Lys Ile Ser Lys Val Leu Pro 200 205 210 TTC ACA ACC CAT AGC TTG CAA GTG AGT AAA ATC GTG CTA GCT TTG ACG 953 Phe Thr Thr His Ser Leu Gln Val Ser Lys Ile Val Leu Ala Leu Thr 215 220 225 ATT TTA GCC TTA TTG CTG GGT TTA AGG AAG TTG ATC ACT TGG CTT TTA 1001 Ile Leu Ala Leu Leu Leu Gly Leu Arg Lys Leu Ile Thr Trp Leu Leu 230 235 240 GCC TTA TTG TTA GAT CGT ATT TTT GAA ATC ATG CAG CGC AAT AAA AAA 1049 Ala Leu Leu Leu Asp Arg Ile Phe Glu Ile Met Gln Arg Asn Lys Lys 245 250 255 ATG CAT GTC AAT GTG CAA AAG AGC ATT GTT TCG CCG GTT TCT GTC TTT 1097 Met His Val Asn Val Gln Lys Ser Ile Val Ser Pro Val Ser Val Phe 260 265 270 275 TTA GCC CTA TTT AGT TGC GAT GTG GCT TTA GAT ATT TTC TAC TAC CCT 1145 Leu Ala Leu Phe Ser Cys Asp Val Ala Leu Asp Ile Phe Tyr Tyr Pro 280 285 290 AAC GCA TCG CCC CCT AAA GTT TCT ATG TGG GTG GGC GCG GTG TAT ATC 1193 Asn Ala Ser Pro Pro Lys Val Ser Met Trp Val Gly Ala Val Tyr Ile 295 300 305 ATG CTT TTA GCA TGG TTA GTG ATA GCG CTT TTT AAA GGC TAT GGG GAA 1241 Met Leu Leu Ala Trp Leu Val Ile Ala Leu Phe Lys Gly Tyr Gly Glu 310 315 320 GCG TTA GTT ACG AAT ATG GCT ACC AAA AGC ACG CAC AAT TTT AGA AAA 1289 Ala Leu Val Thr Asn Met Ala Thr Lys Ser Thr His Asn Phe Arg Lys 325 330 335 GAA GTG ATC AAC TTG ATT TTA AAA GTC GTG TAT TTT TTG ATC TTT ATT 1337 Glu Val Ile Asn Leu Ile Leu Lys Val Val Tyr Phe Leu Ile Phe Ile 340 345 350 355 GTC GCG CTT TTA GGG GTT TTG AAA CAA CTA GGG TTT AAC GTT TCA GCC 1385 Val Ala Leu Leu Gly Val Leu Lys Gln Leu Gly Phe Asn Val Ser Ala 360 365 370 ATC ATC GCT TCT TTA GGG ATT GGG GGG TTA GCG GTG GCT TTG GCG GTT 1433 Ile Ile Ala Ser Leu Gly Ile Gly Gly Leu Ala Val Ala Leu Ala Val 375 380 385 AAA GAT GTG TTA GCG AAT TTT TTT GCT TCG GTC ATT TTA TTA TTA GAC 1481 Lys Asp Val Leu Ala Asn Phe Phe Ala Ser Val Ile Leu Leu Leu Asp 390 395 400 AAT TCG TTT TCT CAA GGG GAT TGG ATC GTG TGC GGT GAA GTG GAG GGC 1529 Asn Ser Phe Ser Gln Gly Asp Trp Ile Val Cys Gly Glu Val Glu Gly 405 410 415 ACG GTG GTG GAA ATG GGG TTA AGG CGC ACC ACG ATC AGA GCC TTT GAC 1577 Thr Val Val Glu Met Gly Leu Arg Arg Thr Thr Ile Arg Ala Phe Asp 420 425 430 435 AAC GCT CTT TTG TCC GTG CCT AAT TCA GAA TTA GCC GGA AAA CCC ATC 1625 Asn Ala Leu Leu Ser Val Pro Asn Ser Glu Leu Ala Gly Lys Pro Ile 440 445 450 AGG AAT TGG AGC CGT CGT AAA GTG GGA AGG CGT ATT AAA ATG GAA ATA 1673 Arg Asn Trp Ser Arg Arg Lys Val Gly Arg Arg Ile Lys Met Glu Ile 455 460 465 GGC TTA ACT TAT AGC TCC AGT CAA AGC GCT TTA CAG CTT TGC GTG AAA 1721 Gly Leu Thr Tyr Ser Ser Ser Gln Ser Ala Leu Gln Leu Cys Val Lys 470 475 480 GAC ATT AAA GAA ATG TTA GAA AAC CAC CCT AAA ATC GCT AAC GGA GCC 1769 Asp Ile Lys Glu Met Leu Glu Asn His Pro Lys Ile Ala Asn Gly Ala 485 490 495 GAT AGC GCT TTG CAA AAT GTG AGC GAT TAC CGC TAC ATG TTT AAA AAA 1817 Asp Ser Ala Leu Gln Asn Val Ser Asp Tyr Arg Tyr Met Phe Lys Lys 500 505 510 515 GAT ATT GTT TCT ATT GAT GAT TTT TTA GGG TAT AAA AAC AAT TTG TTT 1865 Asp Ile Val Ser Ile Asp Asp Phe Leu Gly Tyr Lys Asn Asn Leu Phe 520 525 530 GTC TTT TTA GAT CAG TTT GCG GAC AGC TCT ATT AAT ATT TTA GTG TAT 1913 Val Phe Leu Asp Gln Phe Ala Asp Ser Ser Ile Asn Ile Leu Val Tyr 535 540 545 TGC TTT TCT AAG ACA GTG GTT TGG GAA GAG TGG CTA GAA GTC AAA GAA 1961 Cys Phe Ser Lys Thr Val Val Trp Glu Glu Trp Leu Glu Val Lys Glu 550 555 560 GAT GTG ATG CTA AAA ATC ATG GGG ATT GTA GAA AAG CAC CAT TTG AGT 2009 Asp Val Met Leu Lys Ile Met Gly Ile Val Glu Lys His His Leu Ser 565 570 575 TTT GCT TTC CCA TCA CAG AGT TTG TAT GTG GAG AGT TTG CCA GAA GTT 2057 Phe Ala Phe Pro Ser Gln Ser Leu Tyr Val Glu Ser Leu Pro Glu Val 580 585 590 595 AGC CTG AAA GAA GGG GCT AAA ATC TGAAATTATT GGTAGATGTA TTCTTTGGTT 2111 Ser Leu Lys Glu Gly Ala Lys Ile 600 AAGGGGAAAG TGTTATCCAC GCTGTTGGTT AAAAGCAATT GGAATAAATC CGCGCTCCCC 2171 ACCCTAAAGG CGGATGCGCA AGTCCTTAAA TACAGATCCC ACATGCGGAT AAAGCGTTCG 2231 TCA 2234 623 amino acids amino acid single linear protein internal Signal Sequence 1...20 12 Met Arg Leu Leu Leu Trp Trp Val Leu Val Leu Ser Leu Phe Leu Asn -20 -15 -10 -5 Pro Leu Arg Ala Val Glu Glu His Glu Thr Asp Ala Val Asp Leu Phe 1 5 10 Leu Ile Phe Asn Gln Ile Asn Gln Leu Asn Gln Val Ile Glu Thr Tyr 15 20 25 Lys Lys Asn Pro Glu Arg Ser Ala Glu Ile Ser Leu Tyr Asn Thr Gln 30 35 40 Lys Asn Asp Leu Ile Lys Ser Leu Thr Ser Lys Val Leu Asn Glu Arg 45 50 55 60 Asp Lys Ile Gly Ile Asp Ile Asn Gln Asn Leu Lys Glu Gln Glu Lys 65 70 75 Ile Lys Lys Arg Leu Ser Lys Ser Ile Asn Gly Asp Asp Phe Tyr Thr 80 85 90 Phe Met Lys Asp Arg Leu Ser Leu Asp Ile Leu Leu Ile Asp Glu Ile 95 100 105 Leu Tyr Arg Phe Ile Asp Lys Ile Arg Ser Ser Ile Asp Ile Phe Ser 110 115 120 Glu Gln Lys Asp Val Glu Ser Ile Ser Asp Ala Phe Leu Leu Arg Leu 125 130 135 140 Gly Gln Phe Lys Leu Tyr Thr Phe Pro Lys Asn Leu Gly Asn Val Lys 145 150 155 Met His Glu Leu Glu Gln Met Phe Ser Asp Tyr Glu Leu Arg Leu Asn 160 165 170 Thr Tyr Thr Glu Val Leu Arg Tyr Ile Lys Asn His Pro Lys Glu Val 175 180 185 Leu Pro Lys Asn Leu Ile Met Glu Val Asn Met Asp Phe Val Leu Asn 190 195 200 Lys Ile Ser Lys Val Leu Pro Phe Thr Thr His Ser Leu Gln Val Ser 205 210 215 220 Lys Ile Val Leu Ala Leu Thr Ile Leu Ala Leu Leu Leu Gly Leu Arg 225 230 235 Lys Leu Ile Thr Trp Leu Leu Ala Leu Leu Leu Asp Arg Ile Phe Glu 240 245 250 Ile Met Gln Arg Asn Lys Lys Met His Val Asn Val Gln Lys Ser Ile 255 260 265 Val Ser Pro Val Ser Val Phe Leu Ala Leu Phe Ser Cys Asp Val Ala 270 275 280 Leu Asp Ile Phe Tyr Tyr Pro Asn Ala Ser Pro Pro Lys Val Ser Met 285 290 295 300 Trp Val Gly Ala Val Tyr Ile Met Leu Leu Ala Trp Leu Val Ile Ala 305 310 315 Leu Phe Lys Gly Tyr Gly Glu Ala Leu Val Thr Asn Met Ala Thr Lys 320 325 330 Ser Thr His Asn Phe Arg Lys Glu Val Ile Asn Leu Ile Leu Lys Val 335 340 345 Val Tyr Phe Leu Ile Phe Ile Val Ala Leu Leu Gly Val Leu Lys Gln 350 355 360 Leu Gly Phe Asn Val Ser Ala Ile Ile Ala Ser Leu Gly Ile Gly Gly 365 370 375 380 Leu Ala Val Ala Leu Ala Val Lys Asp Val Leu Ala Asn Phe Phe Ala 385 390 395 Ser Val Ile Leu Leu Leu Asp Asn Ser Phe Ser Gln Gly Asp Trp Ile 400 405 410 Val Cys Gly Glu Val Glu Gly Thr Val Val Glu Met Gly Leu Arg Arg 415 420 425 Thr Thr Ile Arg Ala Phe Asp Asn Ala Leu Leu Ser Val Pro Asn Ser 430 435 440 Glu Leu Ala Gly Lys Pro Ile Arg Asn Trp Ser Arg Arg Lys Val Gly 445 450 455 460 Arg Arg Ile Lys Met Glu Ile Gly Leu Thr Tyr Ser Ser Ser Gln Ser 465 470 475 Ala Leu Gln Leu Cys Val Lys Asp Ile Lys Glu Met Leu Glu Asn His 480 485 490 Pro Lys Ile Ala Asn Gly Ala Asp Ser Ala Leu Gln Asn Val Ser Asp 495 500 505 Tyr Arg Tyr Met Phe Lys Lys Asp Ile Val Ser Ile Asp Asp Phe Leu 510 515 520 Gly Tyr Lys Asn Asn Leu Phe Val Phe Leu Asp Gln Phe Ala Asp Ser 525 530 535 540 Ser Ile Asn Ile Leu Val Tyr Cys Phe Ser Lys Thr Val Val Trp Glu 545 550 555 Glu Trp Leu Glu Val Lys Glu Asp Val Met Leu Lys Ile Met Gly Ile 560 565 570 Val Glu Lys His His Leu Ser Phe Ala Phe Pro Ser Gln Ser Leu Tyr 575 580 585 Val Glu Ser Leu Pro Glu Val Ser Leu Lys Glu Gly Ala Lys Ile 590 595 600 962 base pairs nucleic acid single linear Coding Sequence 97...912 (A) NAME/KEY Signal Sequence (B) LOCATION 97...217 (D) OTHER INFORMATION 13 TTTTATTGAA TGTGTTGTAA TGTTTTTAAG GTATAATAAA CTCTTTTTAA GTCAAGCAAT 60 AAAGTTTGCA ACCTGATGAG AGTAATAATA GAGTTT ATG CTG ATT TCA TTA AAA 114 Met Leu Ile Ser Leu Lys -40 -35 ACA TTC CTA AAA ATA TTA TTG AAA ATA TTC CTA AAA ACC TTC CAA AAG 162 Thr Phe Leu Lys Ile Leu Leu Lys Ile Phe Leu Lys Thr Phe Gln Lys -30 -25 -20 ATT TGG GTA GTT TGC GTT ATT ATT TGG GGG TTA GGC TGT AGT TTT TTA 210 Ile Trp Val Val Cys Val Ile Ile Trp Gly Leu Gly Cys Ser Phe Leu -15 -10 -5 AAC GCT AAC AGC ATT CAA TTA GAA GAA ACG CTC AGA CGA AGC CCT AAA 258 Asn Ala Asn Ser Ile Gln Leu Glu Glu Thr Leu Arg Arg Ser Pro Lys 1 5 10 AAT CTT ATT TGG CAA CAC TTT AAA AAG AAG TTT AAA AAG AGC AAC ACG 306 Asn Leu Ile Trp Gln His Phe Lys Lys Lys Phe Lys Lys Ser Asn Thr 15 20 25 30 ATC CCT TAT GCC CCA AAT AGC CGT TGG AAA TAT TTA GGC ACG AGC ATA 354 Ile Pro Tyr Ala Pro Asn Ser Arg Trp Lys Tyr Leu Gly Thr Ser Ile 35 40 45 GGG ATT TTA GGC GTG TCT TTG GTG ATA GGG ATT GTG GGG CTG TAT CTC 402 Gly Ile Leu Gly Val Ser Leu Val Ile Gly Ile Val Gly Leu Tyr Leu 50 55 60 ATG CCA GAG AGC GTA ACG AAT TGG GAT AAA GAA AAG TTT GGG ATC AAA 450 Met Pro Glu Ser Val Thr Asn Trp Asp Lys Glu Lys Phe Gly Ile Lys 65 70 75 AGT TGG TTT GAA AAT GTC CGC ATG GGG CCA AAA CTG GAC AAT GAT AGT 498 Ser Trp Phe Glu Asn Val Arg Met Gly Pro Lys Leu Asp Asn Asp Ser 80 85 90 TTT ATT TTT AAT GAA ATT TTG CAC CCT TAT TTT GGG GCT ATG TAT TAT 546 Phe Ile Phe Asn Glu Ile Leu His Pro Tyr Phe Gly Ala Met Tyr Tyr 95 100 105 110 ATG CAA CCG CGC ATG GCT GGA TTT AGC TGG ATG GCA TCA GCG TTT TTT 594 Met Gln Pro Arg Met Ala Gly Phe Ser Trp Met Ala Ser Ala Phe Phe 115 120 125 TCT TTT ATC ACT TCC ACG CTT TTT TGG GAA TAT GGC TTG GAA GCG TTT 642 Ser Phe Ile Thr Ser Thr Leu Phe Trp Glu Tyr Gly Leu Glu Ala Phe 130 135 140 GTG GAA GTG CCT AGC TGG CAG GAT TTA GTG ATC ACG CCT TTA TTA GGC 690 Val Glu Val Pro Ser Trp Gln Asp Leu Val Ile Thr Pro Leu Leu Gly 145 150 155 TCC ATT TTA GGG GAG GGG TTT TAT CAG CTC ACG CGC TAT ATC CAA CGC 738 Ser Ile Leu Gly Glu Gly Phe Tyr Gln Leu Thr Arg Tyr Ile Gln Arg 160 165 170 AAT GAA GGC AAG CTT TTT GGC TCT TTA TTT TTA GGG CGT TTA GTC ATC 786 Asn Glu Gly Lys Leu Phe Gly Ser Leu Phe Leu Gly Arg Leu Val Ile 175 180 185 190 GCT CTT ATG GAT CCT ATC GGT TTT ATC ATT AGG GAT TTA GGA CTT GGG 834 Ala Leu Met Asp Pro Ile Gly Phe Ile Ile Arg Asp Leu Gly Leu Gly 195 200 205 GAA GCT TTA GGG ATT TAT AAT AAA CAC GAA ATC CGT TCC AGC TTA AGC 882 Glu Ala Leu Gly Ile Tyr Asn Lys His Glu Ile Arg Ser Ser Leu Ser 210 215 220 CCC AAT GGT TTG AAT TTG ACT TAC AAA TTT TAAGAGCTTA AAATTTAAGA AAA 935 Pro Asn Gly Leu Asn Leu Thr Tyr Lys Phe 225 230 TTATAAAGAG TTTTGATAGA ATACCTT 962 272 amino acids amino acid single linear protein internal Signal Sequence 1...40 14 Met Leu Ile Ser Leu Lys Thr Phe Leu Lys Ile Leu Leu Lys Ile Phe -40 -35 -30 -25 Leu Lys Thr Phe Gln Lys Ile Trp Val Val Cys Val Ile Ile Trp Gly -20 -15 -10 Leu Gly Cys Ser Phe Leu Asn Ala Asn Ser Ile Gln Leu Glu Glu Thr -5 1 5 Leu Arg Arg Ser Pro Lys Asn Leu Ile Trp Gln His Phe Lys Lys Lys 10 15 20 Phe Lys Lys Ser Asn Thr Ile Pro Tyr Ala Pro Asn Ser Arg Trp Lys 25 30 35 40 Tyr Leu Gly Thr Ser Ile Gly Ile Leu Gly Val Ser Leu Val Ile Gly 45 50 55 Ile Val Gly Leu Tyr Leu Met Pro Glu Ser Val Thr Asn Trp Asp Lys 60 65 70 Glu Lys Phe Gly Ile Lys Ser Trp Phe Glu Asn Val Arg Met Gly Pro 75 80 85 Lys Leu Asp Asn Asp Ser Phe Ile Phe Asn Glu Ile Leu His Pro Tyr 90 95 100 Phe Gly Ala Met Tyr Tyr Met Gln Pro Arg Met Ala Gly Phe Ser Trp 105 110 115 120 Met Ala Ser Ala Phe Phe Ser Phe Ile Thr Ser Thr Leu Phe Trp Glu 125 130 135 Tyr Gly Leu Glu Ala Phe Val Glu Val Pro Ser Trp Gln Asp Leu Val 140 145 150 Ile Thr Pro Leu Leu Gly Ser Ile Leu Gly Glu Gly Phe Tyr Gln Leu 155 160 165 Thr Arg Tyr Ile Gln Arg Asn Glu Gly Lys Leu Phe Gly Ser Leu Phe 170 175 180 Leu Gly Arg Leu Val Ile Ala Leu Met Asp Pro Ile Gly Phe Ile Ile 185 190 195 200 Arg Asp Leu Gly Leu Gly Glu Ala Leu Gly Ile Tyr Asn Lys His Glu 205 210 215 Ile Arg Ser Ser Leu Ser Pro Asn Gly Leu Asn Leu Thr Tyr Lys Phe 220 225 230 955 base pairs nucleic acid single linear Coding Sequence 126...806 (A) NAME/KEY Signal Sequence (B) LOCATION 126...237 (D) OTHER INFORMATION 15 GTCAGCCTTT AAAGGTTTCA TTATAGCAAA GAATATTATT TTTTTATTCC TTGCGTTTTC 60 TGTGCGTTTG TGGGGCAAAT AAGATATAAT CGCCTTTTTA AAATTCATTT TTTAAAGGGG 120 TTTGA ATG GTA TTT GAC AGA ACA ATC AGC GTA AGA GAA AAA AAA GCG GCT 170 Met Val Phe Asp Arg Thr Ile Ser Val Arg Glu Lys Lys Ala Ala -35 -30 -25 AAA ACG CTT GGG ATT ATT GGG ATC GTC TTT TTT ATT TTG TTT GGC ATC 218 Lys Thr Leu Gly Ile Ile Gly Ile Val Phe Phe Ile Leu Phe Gly Ile -20 -15 -10 GTG ATA AGC GGG GTG GCT TTT CAA AAA GAG TGG GTG CAA CAA TTG GAT 266 Val Ile Ser Gly Val Ala Phe Gln Lys Glu Trp Val Gln Gln Leu Asp -5 1 5 10 TTA TTT TTT ATA GAC TTG ATC CGC AAC CCT GCC CCC ATT CAA AAA AGC 314 Leu Phe Phe Ile Asp Leu Ile Arg Asn Pro Ala Pro Ile Gln Lys Ser 15 20 25 GCG TGG CTT TCT TTC GTG TTT TTT AGC ACT TGG TTT GCA CAA AGC AAG 362 Ala Trp Leu Ser Phe Val Phe Phe Ser Thr Trp Phe Ala Gln Ser Lys 30 35 40 CTC ACC ACT CCT ATA GCC TTA CTC ATT GGC TTG TGG TTT GGG TTT CAA 410 Leu Thr Thr Pro Ile Ala Leu Leu Ile Gly Leu Trp Phe Gly Phe Gln 45 50 55 AAA CGC ATC GCT TTG GGG GTG TGG TTT TTC TTT AGC ATC TTA TTA GGT 458 Lys Arg Ile Ala Leu Gly Val Trp Phe Phe Phe Ser Ile Leu Leu Gly 60 65 70 GAA TTC ACC TTA AAA TCC CTT AAG CTT TTA GTG GCG CGC CCA CGG CCT 506 Glu Phe Thr Leu Lys Ser Leu Lys Leu Leu Val Ala Arg Pro Arg Pro 75 80 85 90 GTA ACC AAT GGC GAA TTG GTT TTC GCG CAT GGC TTT AGT TTC CCT AGC 554 Val Thr Asn Gly Glu Leu Val Phe Ala His Gly Phe Ser Phe Pro Ser 95 100 105 GGG CAT GCT TTG GCT TCA GCG CTT TTT TAC GGC TCT TTG GCG TTG TTG 602 Gly His Ala Leu Ala Ser Ala Leu Phe Tyr Gly Ser Leu Ala Leu Leu 110 115 120 TTA TGC TAT TCT AAC GCC AAC AAT CGC ATT AAA ACG ATT ATT GCT GTG 650 Leu Cys Tyr Ser Asn Ala Asn Asn Arg Ile Lys Thr Ile Ile Ala Val 125 130 135 GTT TTG CTT TTT TGG ATT TTT TTA ATG GCG TAT GAT AGG GTT TAT TTA 698 Val Leu Leu Phe Trp Ile Phe Leu Met Ala Tyr Asp Arg Val Tyr Leu 140 145 150 GGG GTG CAT TAC CCT AGC GAT GTT TTA GGA GGG TTT TTA TTA GGG ATT 746 Gly Val His Tyr Pro Ser Asp Val Leu Gly Gly Phe Leu Leu Gly Ile 155 160 165 170 GCT TGG TCG TGC TGC TCT TTA GCG CTT TAT TTA GGG TTT TTG AAA CGC 794 Ala Trp Ser Cys Cys Ser Leu Ala Leu Tyr Leu Gly Phe Leu Lys Arg 175 180 185 CCT TAT AAT CAA TAAAGGCTTT ATTTAACCAA ACACTGACAA CTAAAATTTT TAAAA 851 Pro Tyr Asn Gln 190 TTCTATTTTT TGATAAAACT CATTCTCTTA AGGGGATAGG GGGTATTTTG CGATAATACC 911 CCCTTAACCC CCTTAAGAAA CCCCCTAACC CCCAAGACCG CTTT 955 227 amino acids amino acid single linear protein internal Signal Sequence 1...37 16 Met Val Phe Asp Arg Thr Ile Ser Val Arg Glu Lys Lys Ala Ala Lys -35 -30 -25 Thr Leu Gly Ile Ile Gly Ile Val Phe Phe Ile Leu Phe Gly Ile Val -20 -15 -10 Ile Ser Gly Val Ala Phe Gln Lys Glu Trp Val Gln Gln Leu Asp Leu -5 1 5 10 Phe Phe Ile Asp Leu Ile Arg Asn Pro Ala Pro Ile Gln Lys Ser Ala 15 20 25 Trp Leu Ser Phe Val Phe Phe Ser Thr Trp Phe Ala Gln Ser Lys Leu 30 35 40 Thr Thr Pro Ile Ala Leu Leu Ile Gly Leu Trp Phe Gly Phe Gln Lys 45 50 55 Arg Ile Ala Leu Gly Val Trp Phe Phe Phe Ser Ile Leu Leu Gly Glu 60 65 70 75 Phe Thr Leu Lys Ser Leu Lys Leu Leu Val Ala Arg Pro Arg Pro Val 80 85 90 Thr Asn Gly Glu Leu Val Phe Ala His Gly Phe Ser Phe Pro Ser Gly 95 100 105 His Ala Leu Ala Ser Ala Leu Phe Tyr Gly Ser Leu Ala Leu Leu Leu 110 115 120 Cys Tyr Ser Asn Ala Asn Asn Arg Ile Lys Thr Ile Ile Ala Val Val 125 130 135 Leu Leu Phe Trp Ile Phe Leu Met Ala Tyr Asp Arg Val Tyr Leu Gly 140 145 150 155 Val His Tyr Pro Ser Asp Val Leu Gly Gly Phe Leu Leu Gly Ile Ala 160 165 170 Trp Ser Cys Cys Ser Leu Ala Leu Tyr Leu Gly Phe Leu Lys Arg Pro 175 180 185 Tyr Asn Gln 190 3280 base pairs nucleic acid single linear Coding Sequence 151...3207 (A) NAME/KEY Signal Sequence (B) LOCATION 151...241 (D) OTHER INFORMATION 17 TAAAGGTTTT AGGCCTGTGG TGGTTCAAGT TTTAGAAGAG CGCAGCAAGA TTTTTATCGT 60 GAACGCTCAA AATTTACACC CTAATGACAG CGTGGCAGTG GGGTCATTGA TAGGGTTAAA 120 AGGCATGATC AACAATTTAG GGGAGGAATG ATG CTC GCT TCC ATT ATT GAA TTT 174 Met Leu Ala Ser Ile Ile Glu Phe -30 -25 TCC TTA CGC CAA AGA GTG ATC GTG ATT GTT GGT GCG ATT CTT ATT TTA 222 Ser Leu Arg Gln Arg Val Ile Val Ile Val Gly Ala Ile Leu Ile Leu -20 -15 -10 TTT TTT GGG ACT TAT AGT TTT ATC AAC ACT CCA GTG GAC GCT TTC CCG 270 Phe Phe Gly Thr Tyr Ser Phe Ile Asn Thr Pro Val Asp Ala Phe Pro -5 1 5 10 GAT ATT TCG CCC ACT CAA GTT AAA ATC ATT TTA AAA CTC CCC GGC TCT 318 Asp Ile Ser Pro Thr Gln Val Lys Ile Ile Leu Lys Leu Pro Gly Ser 15 20 25 AGC CCT GAA GAA ATG GAA AAC AAC ATC GTG CGC CCT TTA GAA TTG GAG 366 Ser Pro Glu Glu Met Glu Asn Asn Ile Val Arg Pro Leu Glu Leu Glu 30 35 40 CTT TTA GGC TTG AAA GGG CAA AAA TCT TTA AGG AGT GTT TCA AAA TAT 414 Leu Leu Gly Leu Lys Gly Gln Lys Ser Leu Arg Ser Val Ser Lys Tyr 45 50 55 TCT ATT TCA GAT ATT ACG ATA GAT TTT GAT GAC AGC GTG GAT ATT TAT 462 Ser Ile Ser Asp Ile Thr Ile Asp Phe Asp Asp Ser Val Asp Ile Tyr 60 65 70 TTA GCG AGG AAT ATT GTC AAT GAG CGC TTG AGC AGC GTG ATG AAA GAT 510 Leu Ala Arg Asn Ile Val Asn Glu Arg Leu Ser Ser Val Met Lys Asp 75 80 85 90 TTA CCC GTG GGG GTT GAG GGG GGC ATG GCG CCC ATT GTT ACG CCG CTA 558 Leu Pro Val Gly Val Glu Gly Gly Met Ala Pro Ile Val Thr Pro Leu 95 100 105 TCA GAT ATC TTT ATG TTC ACT ATT GAT GGC AAT ATC ACT GAG ATA GAA 606 Ser Asp Ile Phe Met Phe Thr Ile Asp Gly Asn Ile Thr Glu Ile Glu 110 115 120 AAA CGA CAG CTT TTA GAT TTT GTG ATC CGC CCA CAA TTA AGA ATG ATT 654 Lys Arg Gln Leu Leu Asp Phe Val Ile Arg Pro Gln Leu Arg Met Ile 125 130 135 AGC GGC GTA GCA GAT GTC AAT TCC ATT GGA GGC TTT AGC AGA GCG TTT 702 Ser Gly Val Ala Asp Val Asn Ser Ile Gly Gly Phe Ser Arg Ala Phe 140 145 150 GTG ATC GTG CCG GAT TTT AAT GAC ATG GCA AGG CTT GGG GTG AGT ATT 750 Val Ile Val Pro Asp Phe Asn Asp Met Ala Arg Leu Gly Val Ser Ile 155 160 165 170 TCT GAT TTA GAA TCG GCT GTG AGA GTG AAT TTA AGA AAC AGC GGA GCG 798 Ser Asp Leu Glu Ser Ala Val Arg Val Asn Leu Arg Asn Ser Gly Ala 175 180 185 GGG CGC GTG GAT AGA GAT GGC GAA ACC TTT TTA GTC AAA ATC CAA ACC 846 Gly Arg Val Asp Arg Asp Gly Glu Thr Phe Leu Val Lys Ile Gln Thr 190 195 200 GCT TCT TTG AGT TTA GAA GAC ATT GGC AAA ATC ACC GTT TCC ACT AAT 894 Ala Ser Leu Ser Leu Glu Asp Ile Gly Lys Ile Thr Val Ser Thr Asn 205 210 215 TTA GGG CAT TTG CAC ATT AAG GAT TTT GCG AAA GTC ATC AGC CAG TCT 942 Leu Gly His Leu His Ile Lys Asp Phe Ala Lys Val Ile Ser Gln Ser 220 225 230 CGC ACC CGT TTG GGG TTT GTT ACT AAA GAT GGC GTG GGC GAG ACC ACA 990 Arg Thr Arg Leu Gly Phe Val Thr Lys Asp Gly Val Gly Glu Thr Thr 235 240 245 250 GAA GGC TTG GTG CTT TCT TTA AAA GAC GCT AAC ACC AAA GAA ATC ATC 1038 Glu Gly Leu Val Leu Ser Leu Lys Asp Ala Asn Thr Lys Glu Ile Ile 255 260 265 ACT CAA GTG TAT CAA AAA CTA GAA GAA TTA AAA CCC TTT TTA CCG AAT 1086 Thr Gln Val Tyr Gln Lys Leu Glu Glu Leu Lys Pro Phe Leu Pro Asn 270 275 280 GGC GTG TCC ATT AAT GTT TTT TAT GAT CGC TCA GAA TTT ACG CAA AAA 1134 Gly Val Ser Ile Asn Val Phe Tyr Asp Arg Ser Glu Phe Thr Gln Lys 285 290 295 GCC ATT GCC ACC GTT TCT AAA ACC CTC ATT GAA GCC GTT GTT TTA ATC 1182 Ala Ile Ala Thr Val Ser Lys Thr Leu Ile Glu Ala Val Val Leu Ile 300 305 310 ATC ATC ACG CTC TTT TTA TTT TTA GGG AAT TTG AGG GCG AGC GTG GCT 1230 Ile Ile Thr Leu Phe Leu Phe Leu Gly Asn Leu Arg Ala Ser Val Ala 315 320 325 330 GTG GGG GTG ATT TTA CCT TTA AGC TTG TCC GTG GCG TTT ATT TTT ATC 1278 Val Gly Val Ile Leu Pro Leu Ser Leu Ser Val Ala Phe Ile Phe Ile 335 340 345 AAG TTT AGC GAT CTG ACT TTA AAT TTG ATG AGT TTA GGG GGA TTG GTT 1326 Lys Phe Ser Asp Leu Thr Leu Asn Leu Met Ser Leu Gly Gly Leu Val 350 355 360 ATC GCT ATA GGC ATG CTC ATT GAC TCA GCC GTG GTG GTG GTG GAA AAC 1374 Ile Ala Ile Gly Met Leu Ile Asp Ser Ala Val Val Val Val Glu Asn 365 370 375 GCT TTT GAA AAA TTA AGC GCT AAC ACT AAA ACC ACT AAA CTC CAT GCA 1422 Ala Phe Glu Lys Leu Ser Ala Asn Thr Lys Thr Thr Lys Leu His Ala 380 385 390 ATC TAT CGT TCG TGT AAA GAA ATC GCT GTT TCA GTG GTG AGC GGG GTG 1470 Ile Tyr Arg Ser Cys Lys Glu Ile Ala Val Ser Val Val Ser Gly Val 395 400 405 410 GTG ATC ATC ATT GTG TTT TTT GTG CCG ATT TTA ACC TTA CAG GGG TTA 1518 Val Ile Ile Ile Val Phe Phe Val Pro Ile Leu Thr Leu Gln Gly Leu 415 420 425 GAG GGT AAG ATG TTT AGG CCT TTA GCG CAA AGC ATT GTG TAT GCG CTT 1566 Glu Gly Lys Met Phe Arg Pro Leu Ala Gln Ser Ile Val Tyr Ala Leu 430 435 440 TTA GGC ACT TTA GTT CTA TCT ATT ACA ATC ATT CCT GTA GTC AGC TCT 1614 Leu Gly Thr Leu Val Leu Ser Ile Thr Ile Ile Pro Val Val Ser Ser 445 450 455 CTT GTC TTA AAA GCC ACG CCC CAT AGC GAA ACC TTT TTA ACG AGG TTT 1662 Leu Val Leu Lys Ala Thr Pro His Ser Glu Thr Phe Leu Thr Arg Phe 460 465 470 TTA AAC AGA ATC TAC GCC CCT TTA TTG GAA TTT TTT GTG CAT AAC CCT 1710 Leu Asn Arg Ile Tyr Ala Pro Leu Leu Glu Phe Phe Val His Asn Pro 475 480 485 490 AAA AAA GTG ATT TTA GGA GCG TTT GTT TTT TTA ATC GCA AGC CTT TCT 1758 Lys Lys Val Ile Leu Gly Ala Phe Val Phe Leu Ile Ala Ser Leu Ser 495 500 505 TTA TTC CCT TTT GTG GGG AAG AAT TTC ATG CCC GTT TTA GAT GAG GGC 1806 Leu Phe Pro Phe Val Gly Lys Asn Phe Met Pro Val Leu Asp Glu Gly 510 515 520 GAT GTG GTT TTG AGC GTG GAA ACC ACC CCT TCT ATT TCT TTA GAT CAA 1854 Asp Val Val Leu Ser Val Glu Thr Thr Pro Ser Ile Ser Leu Asp Gln 525 530 535 TCT AGG GAT CTC ATG CTA AAC ATT GAG AGC GCG ATT AAA AAG CAT GTC 1902 Ser Arg Asp Leu Met Leu Asn Ile Glu Ser Ala Ile Lys Lys His Val 540 545 550 AAG GAA GTT AAA AGC ATT GTC GCG CGC ACA GGG AGC GAT GAA TTG GGG 1950 Lys Glu Val Lys Ser Ile Val Ala Arg Thr Gly Ser Asp Glu Leu Gly 555 560 565 570 CTG GAT TTA GGA GGT TTG AAT CAA ACC GAT ACT TTT ATT TCT TTT ATT 1998 Leu Asp Leu Gly Gly Leu Asn Gln Thr Asp Thr Phe Ile Ser Phe Ile 575 580 585 CCT AAA AAA GAA TGG AGC GTT AAA ACC AAA GAT GAA TTA TTA GAA AAA 2046 Pro Lys Lys Glu Trp Ser Val Lys Thr Lys Asp Glu Leu Leu Glu Lys 590 595 600 ATC ATG GAT TCT TTA AAA GAC TTT AAG GGG ATT AAC TTT TCT TTC ACC 2094 Ile Met Asp Ser Leu Lys Asp Phe Lys Gly Ile Asn Phe Ser Phe Thr 605 610 615 CAA CCC ATT GAA ATG AGA ATT TCT GAA ATG CTG ACA GGG GTT AGG GGG 2142 Gln Pro Ile Glu Met Arg Ile Ser Glu Met Leu Thr Gly Val Arg Gly 620 625 630 GAT TTA GCG GTT AAG ATT TTT GGA GAT GGT ATT AGC GAA TTG AAT GAA 2190 Asp Leu Ala Val Lys Ile Phe Gly Asp Gly Ile Ser Glu Leu Asn Glu 635 640 645 650 TTG AGT TTT CAA ATC GCG CAA GCT CTA AAA GGG ATT AAA GGA TCT AGT 2238 Leu Ser Phe Gln Ile Ala Gln Ala Leu Lys Gly Ile Lys Gly Ser Ser 655 660 665 GAA GTT TTA ACC ACG CTT AAT GAG GGC GTG AAT TAT TTG TAT GTA ACC 2286 Glu Val Leu Thr Thr Leu Asn Glu Gly Val Asn Tyr Leu Tyr Val Thr 670 675 680 CCT AAT AAA GAA TCG ATG GCG GAT GTG GGG ATC ACT AGC GAT GAA TTT 2334 Pro Asn Lys Glu Ser Met Ala Asp Val Gly Ile Thr Ser Asp Glu Phe 685 690 695 TCC AAG TTT TTA AAA TCC GCT TTA GAG GGC TTG GTT GTA GAT GTG ATC 2382 Ser Lys Phe Leu Lys Ser Ala Leu Glu Gly Leu Val Val Asp Val Ile 700 705 710 CCT ACA GGG ATT TCA CGC ACG CCA GTG ATG ATC CGC CAA GAG AGC GAT 2430 Pro Thr Gly Ile Ser Arg Thr Pro Val Met Ile Arg Gln Glu Ser Asp 715 720 725 730 TTT GCA AGC TCT ATC ACT AAA ATC AAA AGT TTA GCC TTG ACT TCA AAA 2478 Phe Ala Ser Ser Ile Thr Lys Ile Lys Ser Leu Ala Leu Thr Ser Lys 735 740 745 TAT GGC GTT TTA GTG CCT ATC ACT TCT ATC GCC AAA ATT GAA GAA GTG 2526 Tyr Gly Val Leu Val Pro Ile Thr Ser Ile Ala Lys Ile Glu Glu Val 750 755 760 GAT GGC CCT GTT TCT GTT GTG CGT GAA AAT TCA ATG CGC ATG AGC GTG 2574 Asp Gly Pro Val Ser Val Val Arg Glu Asn Ser Met Arg Met Ser Val 765 770 775 GTT CGC AGT AAT GTG GTG GGG CGC GAT TTG AAA TCT TTT GTA GAA GAG 2622 Val Arg Ser Asn Val Val Gly Arg Asp Leu Lys Ser Phe Val Glu Glu 780 785 790 GCT AAA AAA GTG ATC GCT CAA AAC ATC AAA CTC CCT CCC AGC TAC TAT 2670 Ala Lys Lys Val Ile Ala Gln Asn Ile Lys Leu Pro Pro Ser Tyr Tyr 795 800 805 810 ATC ACT TAT GGG GGG CAG TTT GAA AAC CAG CAA CGG GCC AAT AAA AGG 2718 Ile Thr Tyr Gly Gly Gln Phe Glu Asn Gln Gln Arg Ala Asn Lys Arg 815 820 825 CTC TCC ACC GTT ATC CCT TTA AGC ATC TTA GCG ATT TTT TTC ATT CTT 2766 Leu Ser Thr Val Ile Pro Leu Ser Ile Leu Ala Ile Phe Phe Ile Leu 830 835 840 TTT TTC ACT TTT AAA AGC ATT CCT TTA GCC TTG CTC ATT CTT TTG AAT 2814 Phe Phe Thr Phe Lys Ser Ile Pro Leu Ala Leu Leu Ile Leu Leu Asn 845 850 855 ATC CCT TTT GCG GTT ACC GGA GGC CTT ATT GCG TTG TTT GCG GTC GGG 2862 Ile Pro Phe Ala Val Thr Gly Gly Leu Ile Ala Leu Phe Ala Val Gly 860 865 870 GAG TAT ATT TCA GTG CCA GCG AGC GTG GGC TTT ATC GCT CTT TTT GGG 2910 Glu Tyr Ile Ser Val Pro Ala Ser Val Gly Phe Ile Ala Leu Phe Gly 875 880 885 890 ATT GCG GTT TTA AAT GGC GTG GTG ATG ATA GGC TAT TTT AAA GAG CTT 2958 Ile Ala Val Leu Asn Gly Val Val Met Ile Gly Tyr Phe Lys Glu Leu 895 900 905 CTC TTG CAA GGG AAA AGC GTA GAA GAA TGC GTT TTA TTG GGC GCT AAA 3006 Leu Leu Gln Gly Lys Ser Val Glu Glu Cys Val Leu Leu Gly Ala Lys 910 915 920 AGG CGT TTG AGA CCG GTT TTA ATG ACC GCT TGC ATT GCC GGT TTG GGT 3054 Arg Arg Leu Arg Pro Val Leu Met Thr Ala Cys Ile Ala Gly Leu Gly 925 930 935 TTG CTC CCT TTA TTA TTT TCT CAT AGC GTG GGA TCA GAA GTC CAA AAA 3102 Leu Leu Pro Leu Leu Phe Ser His Ser Val Gly Ser Glu Val Gln Lys 940 945 950 CCT TTA GCG ATC GTG GTG CTT GGA GGC TTG GTT ACC TCA AGC GCT CTA 3150 Pro Leu Ala Ile Val Val Leu Gly Gly Leu Val Thr Ser Ser Ala Leu 955 960 965 970 ACC TTA CTC CTA CTG CCG CCA ATG TTT ATG CTC ATC GCT AAA AAG ATT 3198 Thr Leu Leu Leu Leu Pro Pro Met Phe Met Leu Ile Ala Lys Lys Ile 975 980 985 AAA ATC GTT TGAGTTAAAG GATTTCACAT GCTCGCTTTA GAAATTTATA TTGATATTT 3256 Lys Ile Val GTTTGAAAGA CGCTTTAATA GATT 3280 1019 amino acids amino acid single linear protein internal Signal Sequence 1...30 18 Met Leu Ala Ser Ile Ile Glu Phe Ser Leu Arg Gln Arg Val Ile Val -30 -25 -20 -15 Ile Val Gly Ala Ile Leu Ile Leu Phe Phe Gly Thr Tyr Ser Phe Ile -10 -5 1 Asn Thr Pro Val Asp Ala Phe Pro Asp Ile Ser Pro Thr Gln Val Lys 5 10 15 Ile Ile Leu Lys Leu Pro Gly Ser Ser Pro Glu Glu Met Glu Asn Asn 20 25 30 Ile Val Arg Pro Leu Glu Leu Glu Leu Leu Gly Leu Lys Gly Gln Lys 35 40 45 50 Ser Leu Arg Ser Val Ser Lys Tyr Ser Ile Ser Asp Ile Thr Ile Asp 55 60 65 Phe Asp Asp Ser Val Asp Ile Tyr Leu Ala Arg Asn Ile Val Asn Glu 70 75 80 Arg Leu Ser Ser Val Met Lys Asp Leu Pro Val Gly Val Glu Gly Gly 85 90 95 Met Ala Pro Ile Val Thr Pro Leu Ser Asp Ile Phe Met Phe Thr Ile 100 105 110 Asp Gly Asn Ile Thr Glu Ile Glu Lys Arg Gln Leu Leu Asp Phe Val 115 120 125 130 Ile Arg Pro Gln Leu Arg Met Ile Ser Gly Val Ala Asp Val Asn Ser 135 140 145 Ile Gly Gly Phe Ser Arg Ala Phe Val Ile Val Pro Asp Phe Asn Asp 150 155 160 Met Ala Arg Leu Gly Val Ser Ile Ser Asp Leu Glu Ser Ala Val Arg 165 170 175 Val Asn Leu Arg Asn Ser Gly Ala Gly Arg Val Asp Arg Asp Gly Glu 180 185 190 Thr Phe Leu Val Lys Ile Gln Thr Ala Ser Leu Ser Leu Glu Asp Ile 195 200 205 210 Gly Lys Ile Thr Val Ser Thr Asn Leu Gly His Leu His Ile Lys Asp 215 220 225 Phe Ala Lys Val Ile Ser Gln Ser Arg Thr Arg Leu Gly Phe Val Thr 230 235 240 Lys Asp Gly Val Gly Glu Thr Thr Glu Gly Leu Val Leu Ser Leu Lys 245 250 255 Asp Ala Asn Thr Lys Glu Ile Ile Thr Gln Val Tyr Gln Lys Leu Glu 260 265 270 Glu Leu Lys Pro Phe Leu Pro Asn Gly Val Ser Ile Asn Val Phe Tyr 275 280 285 290 Asp Arg Ser Glu Phe Thr Gln Lys Ala Ile Ala Thr Val Ser Lys Thr 295 300 305 Leu Ile Glu Ala Val Val Leu Ile Ile Ile Thr Leu Phe Leu Phe Leu 310 315 320 Gly Asn Leu Arg Ala Ser Val Ala Val Gly Val Ile Leu Pro Leu Ser 325 330 335 Leu Ser Val Ala Phe Ile Phe Ile Lys Phe Ser Asp Leu Thr Leu Asn 340 345 350 Leu Met Ser Leu Gly Gly Leu Val Ile Ala Ile Gly Met Leu Ile Asp 355 360 365 370 Ser Ala Val Val Val Val Glu Asn Ala Phe Glu Lys Leu Ser Ala Asn 375 380 385 Thr Lys Thr Thr Lys Leu His Ala Ile Tyr Arg Ser Cys Lys Glu Ile 390 395 400 Ala Val Ser Val Val Ser Gly Val Val Ile Ile Ile Val Phe Phe Val 405 410 415 Pro Ile Leu Thr Leu Gln Gly Leu Glu Gly Lys Met Phe Arg Pro Leu 420 425 430 Ala Gln Ser Ile Val Tyr Ala Leu Leu Gly Thr Leu Val Leu Ser Ile 435 440 445 450 Thr Ile Ile Pro Val Val Ser Ser Leu Val Leu Lys Ala Thr Pro His 455 460 465 Ser Glu Thr Phe Leu Thr Arg Phe Leu Asn Arg Ile Tyr Ala Pro Leu 470 475 480 Leu Glu Phe Phe Val His Asn Pro Lys Lys Val Ile Leu Gly Ala Phe 485 490 495 Val Phe Leu Ile Ala Ser Leu Ser Leu Phe Pro Phe Val Gly Lys Asn 500 505 510 Phe Met Pro Val Leu Asp Glu Gly Asp Val Val Leu Ser Val Glu Thr 515 520 525 530 Thr Pro Ser Ile Ser Leu Asp Gln Ser Arg Asp Leu Met Leu Asn Ile 535 540 545 Glu Ser Ala Ile Lys Lys His Val Lys Glu Val Lys Ser Ile Val Ala 550 555 560 Arg Thr Gly Ser Asp Glu Leu Gly Leu Asp Leu Gly Gly Leu Asn Gln 565 570 575 Thr Asp Thr Phe Ile Ser Phe Ile Pro Lys Lys Glu Trp Ser Val Lys 580 585 590 Thr Lys Asp Glu Leu Leu Glu Lys Ile Met Asp Ser Leu Lys Asp Phe 595 600 605 610 Lys Gly Ile Asn Phe Ser Phe Thr Gln Pro Ile Glu Met Arg Ile Ser 615 620 625 Glu Met Leu Thr Gly Val Arg Gly Asp Leu Ala Val Lys Ile Phe Gly 630 635 640 Asp Gly Ile Ser Glu Leu Asn Glu Leu Ser Phe Gln Ile Ala Gln Ala 645 650 655 Leu Lys Gly Ile Lys Gly Ser Ser Glu Val Leu Thr Thr Leu Asn Glu 660 665 670 Gly Val Asn Tyr Leu Tyr Val Thr Pro Asn Lys Glu Ser Met Ala Asp 675 680 685 690 Val Gly Ile Thr Ser Asp Glu Phe Ser Lys Phe Leu Lys Ser Ala Leu 695 700 705 Glu Gly Leu Val Val Asp Val Ile Pro Thr Gly Ile Ser Arg Thr Pro 710 715 720 Val Met Ile Arg Gln Glu Ser Asp Phe Ala Ser Ser Ile Thr Lys Ile 725 730 735 Lys Ser Leu Ala Leu Thr Ser Lys Tyr Gly Val Leu Val Pro Ile Thr 740 745 750 Ser Ile Ala Lys Ile Glu Glu Val Asp Gly Pro Val Ser Val Val Arg 755 760 765 770 Glu Asn Ser Met Arg Met Ser Val Val Arg Ser Asn Val Val Gly Arg 775 780 785 Asp Leu Lys Ser Phe Val Glu Glu Ala Lys Lys Val Ile Ala Gln Asn 790 795 800 Ile Lys Leu Pro Pro Ser Tyr Tyr Ile Thr Tyr Gly Gly Gln Phe Glu 805 810 815 Asn Gln Gln Arg Ala Asn Lys Arg Leu Ser Thr Val Ile Pro Leu Ser 820 825 830 Ile Leu Ala Ile Phe Phe Ile Leu Phe Phe Thr Phe Lys Ser Ile Pro 835 840 845 850 Leu Ala Leu Leu Ile Leu Leu Asn Ile Pro Phe Ala Val Thr Gly Gly 855 860 865 Leu Ile Ala Leu Phe Ala Val Gly Glu Tyr Ile Ser Val Pro Ala Ser 870 875 880 Val Gly Phe Ile Ala Leu Phe Gly Ile Ala Val Leu Asn Gly Val Val 885 890 895 Met Ile Gly Tyr Phe Lys Glu Leu Leu Leu Gln Gly Lys Ser Val Glu 900 905 910 Glu Cys Val Leu Leu Gly Ala Lys Arg Arg Leu Arg Pro Val Leu Met 915 920 925 930 Thr Ala Cys Ile Ala Gly Leu Gly Leu Leu Pro Leu Leu Phe Ser His 935 940 945 Ser Val Gly Ser Glu Val Gln Lys Pro Leu Ala Ile Val Val Leu Gly 950 955 960 Gly Leu Val Thr Ser Ser Ala Leu Thr Leu Leu Leu Leu Pro Pro Met 965 970 975 Phe Met Leu Ile Ala Lys Lys Ile Lys Ile Val 980 985 1183 base pairs nucleic acid single linear Coding Sequence 91...1032 (A) NAME/KEY Signal Sequence (B) LOCATION 91...148 (D) OTHER INFORMATION 19 CTTAAAAGAA ACTTCGCAAA CCTTTTTATA TTATTTTAAA AGCACTAATA TTTATTATAT 60 TAGTTACAAC TATTTATTGT AAAGGCTAAA ATG TTG AAA TTT AAA TAT GGT TTG 114 Met Leu Lys Phe Lys Tyr Gly Leu -15 ATT TAT ATC GCG CTC ATA CTA GGA CTT CAA GCG ACA GAT TAT GAC AAT 162 Ile Tyr Ile Ala Leu Ile Leu Gly Leu Gln Ala Thr Asp Tyr Asp Asn -10 -5 1 5 TTA GAA GAA GAA AAC CAA CAA TTA GAT GAA AAA ATA AAC CAT TTA AAG 210 Leu Glu Glu Glu Asn Gln Gln Leu Asp Glu Lys Ile Asn His Leu Lys 10 15 20 CAA CAG CTC ACC GAA AAA GGG GTT TCG CCC AAA GAG ATG GAT AAG GAT 258 Gln Gln Leu Thr Glu Lys Gly Val Ser Pro Lys Glu Met Asp Lys Asp 25 30 35 AAG TTT GAA GAA GAA TAC ATC AAT CGA TCT TAT CCT AAA ATT TCT TCC 306 Lys Phe Glu Glu Glu Tyr Ile Asn Arg Ser Tyr Pro Lys Ile Ser Ser 40 45 50 AAG AAA AAA GAG AAA TTG CTC AAA TCT TTT TCC ATA GCC GAT GAT AAG 354 Lys Lys Lys Glu Lys Leu Leu Lys Ser Phe Ser Ile Ala Asp Asp Lys 55 60 65 AGT GGG GTT TTT TTA GGG GGT GGG TAT GCT TAT GGG GAA CTT AAC TTG 402 Ser Gly Val Phe Leu Gly Gly Gly Tyr Ala Tyr Gly Glu Leu Asn Leu 70 75 80 85 TCT TAT CAA GGG GAA ATG TTA GAC AGA TAC GGC GCG AAT GCC CCT AGC 450 Ser Tyr Gln Gly Glu Met Leu Asp Arg Tyr Gly Ala Asn Ala Pro Ser 90 95 100 GCG TTT AAA AAC AAT ATC AAT ATT AAC GCT CCT GTT TCT ATG ATT AGC 498 Ala Phe Lys Asn Asn Ile Asn Ile Asn Ala Pro Val Ser Met Ile Ser 105 110 115 GCT AAA TTT GGG TAT CAA AAA TAC TTT GTG TCT TAT TTT GGG ACA CGA 546 Ala Lys Phe Gly Tyr Gln Lys Tyr Phe Val Ser Tyr Phe Gly Thr Arg 120 125 130 TTT TAT GGG GAT TTA TTG CTT GGG GGT GGG GCA TTA AAA GAG GAT GCA 594 Phe Tyr Gly Asp Leu Leu Leu Gly Gly Gly Ala Leu Lys Glu Asp Ala 135 140 145 ATC AAG CAG CCT GTA GGC TCG TTT ATT TAT GTT TTA GGG GCT GTC AAT 642 Ile Lys Gln Pro Val Gly Ser Phe Ile Tyr Val Leu Gly Ala Val Asn 150 155 160 165 ACC GAT TTA TTG TTT GAT ATG CCT TTA GAT TTT AAA ACT AAA AAG CAT 690 Thr Asp Leu Leu Phe Asp Met Pro Leu Asp Phe Lys Thr Lys Lys His 170 175 180 TTT TTA GGC GTT TAT GCG GGT TTT GGG ATA GGG CTT ATG CTC TAT CAA 738 Phe Leu Gly Val Tyr Ala Gly Phe Gly Ile Gly Leu Met Leu Tyr Gln 185 190 195 GAC AGG CCT AAT CAA AAC GGG AGG AAT TTA GTA GTG GGG GGC TAT TCA 786 Asp Arg Pro Asn Gln Asn Gly Arg Asn Leu Val Val Gly Gly Tyr Ser 200 205 210 AGC CCT AAT TTT TTA TGG AAA TCT TTG ATT GAA GTG GAT TAC ACT TTT 834 Ser Pro Asn Phe Leu Trp Lys Ser Leu Ile Glu Val Asp Tyr Thr Phe 215 220 225 AAT GTG GGC GTG AGT TTA ACG CTT TAT AGG AAA CAC CGT TTA GAG ATT 882 Asn Val Gly Val Ser Leu Thr Leu Tyr Arg Lys His Arg Leu Glu Ile 230 235 240 245 GGC ACA AAA TTG CCG ATT AGC TAT TTG AGA ATG GGA GTG GAA GAG GGA 930 Gly Thr Lys Leu Pro Ile Ser Tyr Leu Arg Met Gly Val Glu Glu Gly 250 255 260 GCG ATT TAT CAA AAT AAA GAA GAT GAT GAG CGT TTG TTG GTT TCG GCT 978 Ala Ile Tyr Gln Asn Lys Glu Asp Asp Glu Arg Leu Leu Val Ser Ala 265 270 275 AAC AAC CAG TTC AAG CGA TCC AGT TTT TTA TTA GTG AAT TAT GCG TTT 1026 Asn Asn Gln Phe Lys Arg Ser Ser Phe Leu Leu Val Asn Tyr Ala Phe 280 285 290 ATT TTT TAAGGCTTGA TCTTGGAGTT AAGGTTTAAA ATTTTAGCGT TAGTCGTTTT AA 1084 Ile Phe 295 TTTTAGGGGG TTATTTGATT TTTAACGCTT TAATCACAAA ACCCAGAGCT TTAAGTTTTA 1144 GTTTAAATAG CAAAGAGGGT GCGCTTAATG ACAATGATG 1183 314 amino acids amino acid single linear protein internal Signal Sequence 1...19 20 Met Leu Lys Phe Lys Tyr Gly Leu Ile Tyr Ile Ala Leu Ile Leu Gly -15 -10 -5 Leu Gln Ala Thr Asp Tyr Asp Asn Leu Glu Glu Glu Asn Gln Gln Leu 1 5 10 Asp Glu Lys Ile Asn His Leu Lys Gln Gln Leu Thr Glu Lys Gly Val 15 20 25 Ser Pro Lys Glu Met Asp Lys Asp Lys Phe Glu Glu Glu Tyr Ile Asn 30 35 40 45 Arg Ser Tyr Pro Lys Ile Ser Ser Lys Lys Lys Glu Lys Leu Leu Lys 50 55 60 Ser Phe Ser Ile Ala Asp Asp Lys Ser Gly Val Phe Leu Gly Gly Gly 65 70 75 Tyr Ala Tyr Gly Glu Leu Asn Leu Ser Tyr Gln Gly Glu Met Leu Asp 80 85 90 Arg Tyr Gly Ala Asn Ala Pro Ser Ala Phe Lys Asn Asn Ile Asn Ile 95 100 105 Asn Ala Pro Val Ser Met Ile Ser Ala Lys Phe Gly Tyr Gln Lys Tyr 110 115 120 125 Phe Val Ser Tyr Phe Gly Thr Arg Phe Tyr Gly Asp Leu Leu Leu Gly 130 135 140 Gly Gly Ala Leu Lys Glu Asp Ala Ile Lys Gln Pro Val Gly Ser Phe 145 150 155 Ile Tyr Val Leu Gly Ala Val Asn Thr Asp Leu Leu Phe Asp Met Pro 160 165 170 Leu Asp Phe Lys Thr Lys Lys His Phe Leu Gly Val Tyr Ala Gly Phe 175 180 185 Gly Ile Gly Leu Met Leu Tyr Gln Asp Arg Pro Asn Gln Asn Gly Arg 190 195 200 205 Asn Leu Val Val Gly Gly Tyr Ser Ser Pro Asn Phe Leu Trp Lys Ser 210 215 220 Leu Ile Glu Val Asp Tyr Thr Phe Asn Val Gly Val Ser Leu Thr Leu 225 230 235 Tyr Arg Lys His Arg Leu Glu Ile Gly Thr Lys Leu Pro Ile Ser Tyr 240 245 250 Leu Arg Met Gly Val Glu Glu Gly Ala Ile Tyr Gln Asn Lys Glu Asp 255 260 265 Asp Glu Arg Leu Leu Val Ser Ala Asn Asn Gln Phe Lys Arg Ser Ser 270 275 280 285 Phe Leu Leu Val Asn Tyr Ala Phe Ile Phe 290 295 2185 base pairs nucleic acid single linear Coding Sequence 81...2069 (A) NAME/KEY Signal Sequence (B) LOCATION 81...144 (D) OTHER INFORMATION 21 GTAAAAAATG GCTTATCTGT TCTAGCCTAC TCCCCTTATT TTTTCTTAAT CCCTTAGCGG 60 CAGAAGATGA TGGGTTTTTT ATG GGG GTG AGT TAT CAA ACT TCT CTA GCT 110 Met Gly Val Ser Tyr Gln Thr Ser Leu Ala -20 -15 ATT CAA AGG GTG GAT AAC TCA GGG CTT AAC GCC AGT CAA GCC GCA TCC 158 Ile Gln Arg Val Asp Asn Ser Gly Leu Asn Ala Ser Gln Ala Ala Ser -10 -5 1 5 ACC TAC ATC CGC CAG AAC GCT ATC GCT CTA GAA TCT GCG GCG GTG CCT 206 Thr Tyr Ile Arg Gln Asn Ala Ile Ala Leu Glu Ser Ala Ala Val Pro 10 15 20 TTA GCC TAT TAT TTA GAA GCG ATG GGC CAA CAA ACC AGG GTT TTA ATG 254 Leu Ala Tyr Tyr Leu Glu Ala Met Gly Gln Gln Thr Arg Val Leu Met 25 30 35 CAA ATG CTC TGC CCT GAT CCT TCC AAA CGC TGT TTG CTC TAT GCT GGA 302 Gln Met Leu Cys Pro Asp Pro Ser Lys Arg Cys Leu Leu Tyr Ala Gly 40 45 50 GGT TAT AAA AAC GGA TCA AGT AAT ACT AAC GGC GAT ACA GGC AAC AAC 350 Gly Tyr Lys Asn Gly Ser Ser Asn Thr Asn Gly Asp Thr Gly Asn Asn 55 60 65 CCC CCA AGA GGC AAT GTC AAT GCC ACC TTT GAT ATG CAA TCT CTA GTC 398 Pro Pro Arg Gly Asn Val Asn Ala Thr Phe Asp Met Gln Ser Leu Val 70 75 80 85 AAT AAT TTA AAC AAG CTC ACC CAA CTC ATC GGC GAG ACT TTA ATC CGT 446 Asn Asn Leu Asn Lys Leu Thr Gln Leu Ile Gly Glu Thr Leu Ile Arg 90 95 100 AAC CCT GAA AAT CTT TCT AAC GCC AAA GTC TTT AAT GTC AAA TTT GGC 494 Asn Pro Glu Asn Leu Ser Asn Ala Lys Val Phe Asn Val Lys Phe Gly 105 110 115 AAT CAA AGC ACT GTT ATT GCA TTG CCT GAG GGT CTA GCC AAT ACC ATG 542 Asn Gln Ser Thr Val Ile Ala Leu Pro Glu Gly Leu Ala Asn Thr Met 120 125 130 AAC GCT TTA AAC GAT GAT ATT ACC AAC GCT TTA ACC ACG CTC TGG TAT 590 Asn Ala Leu Asn Asp Asp Ile Thr Asn Ala Leu Thr Thr Leu Trp Tyr 135 140 145 AAC CAA ACC TTA ACG AAT AAA TCT TTT AAT AGC GGT AAT TCC GTG AAT 638 Asn Gln Thr Leu Thr Asn Lys Ser Phe Asn Ser Gly Asn Ser Val Asn 150 155 160 165 TTT AGC CCC CAA GTC TTG CAA CAC CTT TTA CAA GAC GGC TTA GCC ACA 686 Phe Ser Pro Gln Val Leu Gln His Leu Leu Gln Asp Gly Leu Ala Thr 170 175 180 AGT AAT CAA ACC ATT TGC AGC ACT CAA AAC CAA TGC ACC GCC ACC AAT 734 Ser Asn Gln Thr Ile Cys Ser Thr Gln Asn Gln Cys Thr Ala Thr Asn 185 190 195 GAA GCT AAA TCT ATC GCT CAA AAC GCC CAA AAC ATC TTC CAG GCT TTA 782 Glu Ala Lys Ser Ile Ala Gln Asn Ala Gln Asn Ile Phe Gln Ala Leu 200 205 210 ATG CAA GCA GGG ATT TTA GGG GGC TTA GCC AAT GAA AAG CAA TTT GGC 830 Met Gln Ala Gly Ile Leu Gly Gly Leu Ala Asn Glu Lys Gln Phe Gly 215 220 225 TTC ACT TAC AAC AAA GCC CCT AAT GGT AGC GAT TCC CAA CAA GGC TAC 878 Phe Thr Tyr Asn Lys Ala Pro Asn Gly Ser Asp Ser Gln Gln Gly Tyr 230 235 240 245 CAA AGC TTT AGC GGC CCG GGT TAT TAC ACT AAA AAC GGC GCT AAT GGC 926 Gln Ser Phe Ser Gly Pro Gly Tyr Tyr Thr Lys Asn Gly Ala Asn Gly 250 255 260 ACT ACC CAA GCG CCC TTG AAA GCA TTA CCC GCT GGA GCG ACA ATT GGA 974 Thr Thr Gln Ala Pro Leu Lys Ala Leu Pro Ala Gly Ala Thr Ile Gly 265 270 275 TCA GGC AAT GGC CAA TAC ACC TAC CAC CCC AGC TCG GCA GTC TAT TAT 1022 Ser Gly Asn Gly Gln Tyr Thr Tyr His Pro Ser Ser Ala Val Tyr Tyr 280 285 290 TTA GCC GAT AGC ATC ATT GCT AAT GGC ATC ACC GCT TCT ATG ATT TTT 1070 Leu Ala Asp Ser Ile Ile Ala Asn Gly Ile Thr Ala Ser Met Ile Phe 295 300 305 TCA GGC ATG CAA AAT TTC GCC AAT AAA GCC GCT AAA CTG ACA GGC ACT 1118 Ser Gly Met Gln Asn Phe Ala Asn Lys Ala Ala Lys Leu Thr Gly Thr 310 315 320 325 TCA AGC TAT AGC CAG ATG CAA GAT GCG ATC AAT TAC GGG GAA AGC TTG 1166 Ser Ser Tyr Ser Gln Met Gln Asp Ala Ile Asn Tyr Gly Glu Ser Leu 330 335 340 CTC AGT AAC ACC GTA GCG TAT GGG GAT TTC ATC ACC AAT TGG GTC GCC 1214 Leu Ser Asn Thr Val Ala Tyr Gly Asp Phe Ile Thr Asn Trp Val Ala 345 350 355 CCC TAT TTG GAT TTA AAC AAC AAA GGT TTG AAT TTC TTG CCT AGC TAT 1262 Pro Tyr Leu Asp Leu Asn Asn Lys Gly Leu Asn Phe Leu Pro Ser Tyr 360 365 370 GGG GGG CAA TTG AAT GGT GCT AAT CAT CAA ACC CCA CAA TTA ACC CCG 1310 Gly Gly Gln Leu Asn Gly Ala Asn His Gln Thr Pro Gln Leu Thr Pro 375 380 385 CAA CAA GCC CAA CAA GAG CAA AAA GTC ATC ATG AAC CAA CTA GAG CAA 1358 Gln Gln Ala Gln Gln Glu Gln Lys Val Ile Met Asn Gln Leu Glu Gln 390 395 400 405 GCC ACA AAC GCC CCC ACC CCC GCG CAA ATA AAC AGG ATT TTA GCC AAC 1406 Ala Thr Asn Ala Pro Thr Pro Ala Gln Ile Asn Arg Ile Leu Ala Asn 410 415 420 CCC TAT TCC CCC ACG GCA AAA ACT TTA ATG GCT TAT GGG CTT TAT CGC 1454 Pro Tyr Ser Pro Thr Ala Lys Thr Leu Met Ala Tyr Gly Leu Tyr Arg 425 430 435 TCT AAA GCA GTG ATT GGC GGG GTG ATT GAT GAA ATG CAA ACT AAA GTG 1502 Ser Lys Ala Val Ile Gly Gly Val Ile Asp Glu Met Gln Thr Lys Val 440 445 450 AAT CAA GTC TAT CAA ATG GGC TTT GCT AGG AAT TTT TTG GAG CAT AAC 1550 Asn Gln Val Tyr Gln Met Gly Phe Ala Arg Asn Phe Leu Glu His Asn 455 460 465 TCT AAT TCT AAT AAC ATG AAC GGC TTT GGC GTG AAA ATG GGC TAT AAG 1598 Ser Asn Ser Asn Asn Met Asn Gly Phe Gly Val Lys Met Gly Tyr Lys 470 475 480 485 CAA TTC TTT GGC AAA AAG CGC ATG TTT GGG CTT AGG TAT TAT GGT TTT 1646 Gln Phe Phe Gly Lys Lys Arg Met Phe Gly Leu Arg Tyr Tyr Gly Phe 490 495 500 TAT GAT TTT GGT TAC GCT CAA TTT GGC GCA GAA TCT TCT TTA GTG AAA 1694 Tyr Asp Phe Gly Tyr Ala Gln Phe Gly Ala Glu Ser Ser Leu Val Lys 505 510 515 GCC ACC CTC TCT AGC TAT GGG GCA GGC ACA GAC TTT CTT TAT AAT GTT 1742 Ala Thr Leu Ser Ser Tyr Gly Ala Gly Thr Asp Phe Leu Tyr Asn Val 520 525 530 TTT ACC CGA AAA AGA GGG ACT GAA GCG ATA GAT ATC GGT TTT TTT GCC 1790 Phe Thr Arg Lys Arg Gly Thr Glu Ala Ile Asp Ile Gly Phe Phe Ala 535 540 545 GGT ATC CAA CTT GCA GGG CAA ACT TGG AAA ACG AAT TTT TTA GAT CAA 1838 Gly Ile Gln Leu Ala Gly Gln Thr Trp Lys Thr Asn Phe Leu Asp Gln 550 555 560 565 GTG GAT GGC AAC CAT CTT AAA CCC AAA GAC ACT TCT TTC CAA TTC CTT 1886 Val Asp Gly Asn His Leu Lys Pro Lys Asp Thr Ser Phe Gln Phe Leu 570 575 580 TTT GAT TTA GGC ATA AGG ACC AAT TTT TCC AAA ATC GCT CAT CAA AAA 1934 Phe Asp Leu Gly Ile Arg Thr Asn Phe Ser Lys Ile Ala His Gln Lys 585 590 595 AGA TCC CGT TTT TCT CAA GGG ATA GAA TTT GGC CTT AAA ATA CCG GTG 1982 Arg Ser Arg Phe Ser Gln Gly Ile Glu Phe Gly Leu Lys Ile Pro Val 600 605 610 CTT TAT CAC ACC TAT TAC CAA TCA GAA GGC GTT ACA GCG AAG TAT AGA 2030 Leu Tyr His Thr Tyr Tyr Gln Ser Glu Gly Val Thr Ala Lys Tyr Arg 615 620 625 AGA GCC TTT AGT TTT TAT GTG GGC TAC AAC ATA GGC TTT TGATTAAACA AA 2081 Arg Ala Phe Ser Phe Tyr Val Gly Tyr Asn Ile Gly Phe 630 635 640 ATAAGGGAAA AATATGATAA AAAAAGCTAG AAAATTCATA CCATTCTTTT TAATTGGCTC 2141 CCTCTTAGCT GAAGACAATG GCTGGTATAT GTCTGTAGGC TATC 2185 663 amino acids amino acid single linear protein internal Signal Sequence 1...21 22 Met Gly Val Ser Tyr Gln Thr Ser Leu Ala Ile Gln Arg Val Asp Asn -20 -15 -10 Ser Gly Leu Asn Ala Ser Gln Ala Ala Ser Thr Tyr Ile Arg Gln Asn -5 1 5 10 Ala Ile Ala Leu Glu Ser Ala Ala Val Pro Leu Ala Tyr Tyr Leu Glu 15 20 25 Ala Met Gly Gln Gln Thr Arg Val Leu Met Gln Met Leu Cys Pro Asp 30 35 40 Pro Ser Lys Arg Cys Leu Leu Tyr Ala Gly Gly Tyr Lys Asn Gly Ser 45 50 55 Ser Asn Thr Asn Gly Asp Thr Gly Asn Asn Pro Pro Arg Gly Asn Val 60 65 70 75 Asn Ala Thr Phe Asp Met Gln Ser Leu Val Asn Asn Leu Asn Lys Leu 80 85 90 Thr Gln Leu Ile Gly Glu Thr Leu Ile Arg Asn Pro Glu Asn Leu Ser 95 100 105 Asn Ala Lys Val Phe Asn Val Lys Phe Gly Asn Gln Ser Thr Val Ile 110 115 120 Ala Leu Pro Glu Gly Leu Ala Asn Thr Met Asn Ala Leu Asn Asp Asp 125 130 135 Ile Thr Asn Ala Leu Thr Thr Leu Trp Tyr Asn Gln Thr Leu Thr Asn 140 145 150 155 Lys Ser Phe Asn Ser Gly Asn Ser Val Asn Phe Ser Pro Gln Val Leu 160 165 170 Gln His Leu Leu Gln Asp Gly Leu Ala Thr Ser Asn Gln Thr Ile Cys 175 180 185 Ser Thr Gln Asn Gln Cys Thr Ala Thr Asn Glu Ala Lys Ser Ile Ala 190 195 200 Gln Asn Ala Gln Asn Ile Phe Gln Ala Leu Met Gln Ala Gly Ile Leu 205 210 215 Gly Gly Leu Ala Asn Glu Lys Gln Phe Gly Phe Thr Tyr Asn Lys Ala 220 225 230 235 Pro Asn Gly Ser Asp Ser Gln Gln Gly Tyr Gln Ser Phe Ser Gly Pro 240 245 250 Gly Tyr Tyr Thr Lys Asn Gly Ala Asn Gly Thr Thr Gln Ala Pro Leu 255 260 265 Lys Ala Leu Pro Ala Gly Ala Thr Ile Gly Ser Gly Asn Gly Gln Tyr 270 275 280 Thr Tyr His Pro Ser Ser Ala Val Tyr Tyr Leu Ala Asp Ser Ile Ile 285 290 295 Ala Asn Gly Ile Thr Ala Ser Met Ile Phe Ser Gly Met Gln Asn Phe 300 305 310 315 Ala Asn Lys Ala Ala Lys Leu Thr Gly Thr Ser Ser Tyr Ser Gln Met 320 325 330 Gln Asp Ala Ile Asn Tyr Gly Glu Ser Leu Leu Ser Asn Thr Val Ala 335 340 345 Tyr Gly Asp Phe Ile Thr Asn Trp Val Ala Pro Tyr Leu Asp Leu Asn 350 355 360 Asn Lys Gly Leu Asn Phe Leu Pro Ser Tyr Gly Gly Gln Leu Asn Gly 365 370 375 Ala Asn His Gln Thr Pro Gln Leu Thr Pro Gln Gln Ala Gln Gln Glu 380 385 390 395 Gln Lys Val Ile Met Asn Gln Leu Glu Gln Ala Thr Asn Ala Pro Thr 400 405 410 Pro Ala Gln Ile Asn Arg Ile Leu Ala Asn Pro Tyr Ser Pro Thr Ala 415 420 425 Lys Thr Leu Met Ala Tyr Gly Leu Tyr Arg Ser Lys Ala Val Ile Gly 430 435 440 Gly Val Ile Asp Glu Met Gln Thr Lys Val Asn Gln Val Tyr Gln Met 445 450 455 Gly Phe Ala Arg Asn Phe Leu Glu His Asn Ser Asn Ser Asn Asn Met 460 465 470 475 Asn Gly Phe Gly Val Lys Met Gly Tyr Lys Gln Phe Phe Gly Lys Lys 480 485 490 Arg Met Phe Gly Leu Arg Tyr Tyr Gly Phe Tyr Asp Phe Gly Tyr Ala 495 500 505 Gln Phe Gly Ala Glu Ser Ser Leu Val Lys Ala Thr Leu Ser Ser Tyr 510 515 520 Gly Ala Gly Thr Asp Phe Leu Tyr Asn Val Phe Thr Arg Lys Arg Gly 525 530 535 Thr Glu Ala Ile Asp Ile Gly Phe Phe Ala Gly Ile Gln Leu Ala Gly 540 545 550 555 Gln Thr Trp Lys Thr Asn Phe Leu Asp Gln Val Asp Gly Asn His Leu 560 565 570 Lys Pro Lys Asp Thr Ser Phe Gln Phe Leu Phe Asp Leu Gly Ile Arg 575 580 585 Thr Asn Phe Ser Lys Ile Ala His Gln Lys Arg Ser Arg Phe Ser Gln 590 595 600 Gly Ile Glu Phe Gly Leu Lys Ile Pro Val Leu Tyr His Thr Tyr Tyr 605 610 615 Gln Ser Glu Gly Val Thr Ala Lys Tyr Arg Arg Ala Phe Ser Phe Tyr 620 625 630 635 Val Gly Tyr Asn Ile Gly Phe 640 1080 base pairs nucleic acid single linear Coding Sequence 157...987 (A) NAME/KEY Signal Sequence (B) LOCATION 157...226 (D) OTHER INFORMATION 23 AGCGGTAAAA TCGCTGAAGA AAACAACGCT AAAGAATTTT TTAACCACCC GAAATCTCAA 60 AGAGCGCAAA AATTTTTAGA AACTTTCCAT TTTTTAGGGA GCTGTTAAAT AAAGTTTGCT 120 AAAAAGATGA TTCTAATTTC AAAAAAAGGT GTTTTT ATG AAA ACA AAC GGG CTT 174 Met Lys Thr Asn Gly Leu -20 TTT AAA ATG TGG GGG CTG TTT TTA GTT TTA ATC GCT TTA GTC TTT AAT 222 Phe Lys Met Trp Gly Leu Phe Leu Val Leu Ile Ala Leu Val Phe Asn -15 -10 -5 GCA TGT TCT GAT AGC CAT AAA GAA AAA AAG GAC GCT TTA GAA GTC ATT 270 Ala Cys Ser Asp Ser His Lys Glu Lys Lys Asp Ala Leu Glu Val Ile 1 5 10 15 AAA CAA AGA GGG GTT TTA AAA GTG GGG GTT TTT AGC GAT AAG CCT CCT 318 Lys Gln Arg Gly Val Leu Lys Val Gly Val Phe Ser Asp Lys Pro Pro 20 25 30 TTT GGC TCT GTG GAT TCT AAA GGG AAA TAT CAA GGC TAT GAT GTA GTT 366 Phe Gly Ser Val Asp Ser Lys Gly Lys Tyr Gln Gly Tyr Asp Val Val 35 40 45 ATT GCT AAA CGC ATG GCT CTT GAT TTA TTG GGC GAT GAA AAT AAG ATT 414 Ile Ala Lys Arg Met Ala Leu Asp Leu Leu Gly Asp Glu Asn Lys Ile 50 55 60 GAG TTT ATT CCT GTA GAA GCT TCA GCT AGG GTG GAA TTT TTA AAA GCC 462 Glu Phe Ile Pro Val Glu Ala Ser Ala Arg Val Glu Phe Leu Lys Ala 65 70 75 AAT AAA GTG GAT ATT ATC ATG GCT AAT TTC ACG CGC ACT AAA GAA AGA 510 Asn Lys Val Asp Ile Ile Met Ala Asn Phe Thr Arg Thr Lys Glu Arg 80 85 90 95 GAA AAA GTC GTG GAT TTC GCT AAG CCG TAT ATG AAA GTC GCT TTA GGG 558 Glu Lys Val Val Asp Phe Ala Lys Pro Tyr Met Lys Val Ala Leu Gly 100 105 110 GTG GTT TCT AAA GAT GGG GTC ATT AAA AAT ATA GAA GAG TTG AAA GAT 606 Val Val Ser Lys Asp Gly Val Ile Lys Asn Ile Glu Glu Leu Lys Asp 115 120 125 AAA GAG TTG ATT GTG AAT AAA GGC ACG ACA GCG GAT TTT TAT TTC ACT 654 Lys Glu Leu Ile Val Asn Lys Gly Thr Thr Ala Asp Phe Tyr Phe Thr 130 135 140 AAA AAT TAC CCC AAT ATC AAG CTT TTG AAA TTT GAG CAA AAT ACA GAG 702 Lys Asn Tyr Pro Asn Ile Lys Leu Leu Lys Phe Glu Gln Asn Thr Glu 145 150 155 ACT TTT TTA GCC CTT TTA AAC AAT AAG GCT ACC GCT CTA GCC CAT GAC 750 Thr Phe Leu Ala Leu Leu Asn Asn Lys Ala Thr Ala Leu Ala His Asp 160 165 170 175 AAC ACT TTA TTG CTC GCT TGG ACG AAA CAA CAC CCT GAA TTT AAA TTA 798 Asn Thr Leu Leu Leu Ala Trp Thr Lys Gln His Pro Glu Phe Lys Leu 180 185 190 GGC ATT ACA AGC CTT GGC GAT AAG GAT GTG ATC GCT CCA GCG ATT AAA 846 Gly Ile Thr Ser Leu Gly Asp Lys Asp Val Ile Ala Pro Ala Ile Lys 195 200 205 AAA GGC AAC CCC AAG CTT TTA GAA TGG TTG AAT AAC GAA ATA GAT TCC 894 Lys Gly Asn Pro Lys Leu Leu Glu Trp Leu Asn Asn Glu Ile Asp Ser 210 215 220 CTC ATT TCT AGC GAC TTC TTA AAA GAA GCT TAT CAA GAG ACT TTA GCA 942 Leu Ile Ser Ser Asp Phe Leu Lys Glu Ala Tyr Gln Glu Thr Leu Ala 225 230 235 CCT GTT TAT GGC GAT GAA ATC AAA CCG GAA GAA ATT ATT TTT GAA TGATT 992 Pro Val Tyr Gly Asp Glu Ile Lys Pro Glu Glu Ile Ile Phe Glu 240 245 250 TCTTTAGGCT TTGAATTCTT GACAGGGTGC GTTTTTATTG CTAAATTAGC AATTTTGTGA 1052 TCTTTTTGTT TTTCATTTTG AGATATAT 1080 277 amino acids amino acid single linear protein internal Signal Sequence 1...23 24 Met Lys Thr Asn Gly Leu Phe Lys Met Trp Gly Leu Phe Leu Val Leu -20 -15 -10 Ile Ala Leu Val Phe Asn Ala Cys Ser Asp Ser His Lys Glu Lys Lys -5 1 5 Asp Ala Leu Glu Val Ile Lys Gln Arg Gly Val Leu Lys Val Gly Val 10 15 20 25 Phe Ser Asp Lys Pro Pro Phe Gly Ser Val Asp Ser Lys Gly Lys Tyr 30 35 40 Gln Gly Tyr Asp Val Val Ile Ala Lys Arg Met Ala Leu Asp Leu Leu 45 50 55 Gly Asp Glu Asn Lys Ile Glu Phe Ile Pro Val Glu Ala Ser Ala Arg 60 65 70 Val Glu Phe Leu Lys Ala Asn Lys Val Asp Ile Ile Met Ala Asn Phe 75 80 85 Thr Arg Thr Lys Glu Arg Glu Lys Val Val Asp Phe Ala Lys Pro Tyr 90 95 100 105 Met Lys Val Ala Leu Gly Val Val Ser Lys Asp Gly Val Ile Lys Asn 110 115 120 Ile Glu Glu Leu Lys Asp Lys Glu Leu Ile Val Asn Lys Gly Thr Thr 125 130 135 Ala Asp Phe Tyr Phe Thr Lys Asn Tyr Pro Asn Ile Lys Leu Leu Lys 140 145 150 Phe Glu Gln Asn Thr Glu Thr Phe Leu Ala Leu Leu Asn Asn Lys Ala 155 160 165 Thr Ala Leu Ala His Asp Asn Thr Leu Leu Leu Ala Trp Thr Lys Gln 170 175 180 185 His Pro Glu Phe Lys Leu Gly Ile Thr Ser Leu Gly Asp Lys Asp Val 190 195 200 Ile Ala Pro Ala Ile Lys Lys Gly Asn Pro Lys Leu Leu Glu Trp Leu 205 210 215 Asn Asn Glu Ile Asp Ser Leu Ile Ser Ser Asp Phe Leu Lys Glu Ala 220 225 230 Tyr Gln Glu Thr Leu Ala Pro Val Tyr Gly Asp Glu Ile Lys Pro Glu 235 240 245 Glu Ile Ile Phe Glu 250 30 base pairs nucleic acid single linear 25 GCCNCACAAT GGATAAAAAC AACAATAATC 30 26 base pairs nucleic acid single linear 26 GCCNCTTTAT AGCTATTTTT TCCAAA 26 29 base pairs nucleic acid single linear 27 GCCNTGCATC AATGTTCCTT TTTGTTTTG 29 29 base pairs nucleic acid single linear 28 GCCNAAAATG TCAATTAAAA GGGTTAGAT 29 28 base pairs nucleic acid single linear 29 GCCNATAGGC TATCGTATGG ATTAGAAC 28 29 base pairs nucleic acid single linear 30 GCCNGAAATC ATTTTAAACG ACTCAAAAC 29 32 base pairs nucleic acid single linear 31 GCCNTAGGAG ATATACCATG CTAGGAAACG TT 32 26 base pairs nucleic acid single linear 32 GCCNGAGCCA GACGCTAAAG AGCTTG 26 31 base pairs nucleic acid single linear 33 GCCNGAAATT AAAGTTCTAT TTTTAATTCC T 31 31 base pairs nucleic acid single linear 34 GCCNTAAAGG CGAGGAATGA AAAGATTTGT T 31 29 base pairs nucleic acid single linear 35 GCCNGCCGAG CAAGATTACT TTTTTAGGG 29 30 base pairs nucleic acid single linear 36 GCCNTTCATG GCGATCGGAG TTTCAGCCTG 30 32 base pairs nucleic acid single linear 37 GCCNATAATG GCTGAAAATT CTTTCAAAAA TG 32 31 base pairs nucleic acid single linear 38 GCCNTTGGTT TTTTTTGATT ATGCTCATTT G 31 29 base pairs nucleic acid single linear 39 GCCNCGTTTA GATCTTTTGA AACACGCTC 29 33 base pairs nucleic acid single linear 40 GCCNTGGGGT TTCGCATGCG TTTATTATTG TGG 33 29 base pairs nucleic acid single linear 41 GCCNGTTGAA GAGCATGAAA CAGATGCGG 29 29 base pairs nucleic acid single linear 42 GCCNTTCAGA TTTTAGCCCC TTCTTTCAG 29 33 base pairs nucleic acid single linear 43 GCCNTAGAGT TTATGCTGAT TTCATTAAAA ACA 33 28 base pairs nucleic acid single linear 44 GCCNAACAGC ATTCAATTAG AAGAAACG 28 31 base pairs nucleic acid single linear 45 GCCNAGCTCT TAAAATTTGT AAGTCAAATT C 31 32 base pairs nucleic acid single linear 46 GCCNTAAAGG GGTTTGAATG GTATTTGACA GA 32 28 base pairs nucleic acid single linear 47 GCCNTTTCAA AAAGAGTGGG TGCAACAA 28 27 base pairs nucleic acid single linear 48 GCCNGCCTTT ATTGATTATA AGGGCGT 27 27 base pairs nucleic acid single linear 49 GCCNTAGGGG AGGAATGATG CTCGCTT 27 27 base pairs nucleic acid single linear 50 GCCNTTTATC AACACTCCAG TGGACGC 27 31 base pairs nucleic acid single linear 51 GCCNACTCAA ACGATTTTAA TCTTTTTAGC G 31 34 base pairs nucleic acid single linear 52 GCCNAAAGGC TAAAATGTTG AAATTTAAAT ATGG 34 30 base pairs nucleic acid single linear 53 GCCNACAGAT TATGACAATT TAGAAGAAGA 30 31 base pairs nucleic acid single linear 54 GCCNAAGCCT TAAAAAATAA ACGCATAATT C 31 27 base pairs nucleic acid single linear 55 GCCNTGGGTT TTTTATGGGG GTGAGTT 27 24 base pairs nucleic acid single linear 56 GCCNAGTCAA GCCGCATCAC CTAC 24 28 base pairs nucleic acid single linear 57 GCCNAATCAA AAGCCTATGT TGTAGCCC 28 30 base pairs nucleic acid single linear 58 GCCNAAAGGT GTTTTTATGA AAACAAACGG 30 31 base pairs nucleic acid single linear 59 GCCNTGTTCT GATAGCCATA AAGAAAAAAA G 31 32 base pairs nucleic acid single linear 60 GCCNGAAATC ATTCAAAAAT AATTTCTTCC GG 32 32 base pairs nucleic acid single linear 61 CGCGGATCCG AACTCTTTTT AAGTCAAGCA AT 32 31 base pairs nucleic acid single linear 62 CCGCTCGAGT TAAAATTTGT AAGTCAAATT C 31 24 base pairs nucleic acid single linear 63 GGGAATTCTT GAAATTTAAA TATG 24 24 base pairs nucleic acid single linear 64 CCGCTCGAGT TAAAAAATAA ACGC 24 33 base pairs nucleic acid single linear 65 CGCGGATCCG AAAAGAAAAG GATAACCCCT TGC 33 30 base pairs nucleic acid single linear cDNA 66 CCGCTCGAGT CAAAAGCCTA TGTTGTAGCC 30 

What is claimed is:
 1. An isolated polynucleotide that encodes: (i) a polypeptide comprising an amino acid sequence that is homologous to the amino acid sequence of a Helicobacter membrane-associated polypeptide, wherein said amino acid sequence of said Helicobacter membrane-associated polypeptide is selected from the group consisting of the amino acid sequences as shown: in SEQ ID NO:2, beginning with an amino acid in any one of positions −22 to 5, preferably in position −22 or position 1, and ending with an amino acid in position 525 (GHPO 1012); in SEQ ID NO:4, beginning with an amino acid in any one of positions −25 to 5, preferably in position −25 or position 1, and ending with an amino acid in position 451 (GHPO 1190); in SEQ ID NO:6, beginning with an amino acid in any one of positions −25 to 5, preferably in position −25 or position 1, and ending with an amino acid in position 225 (GHPO 1398); in SEQ ID NO:8, beginning with an amino acid in any one of positions −19 to 5, preferably in position −19 or position 1, and ending with an amino acid in position 310 (GHPO 1501); in SEQ ID NO: 10, beginning with an amino acid in any one of positions −40 to 5, preferably in position −40 or position 1, and ending with an amino acid in position 182 (GHPO 1550); in SEQ ID NO: 12, beginning with an amino acid in any one of positions −20 to 5, preferably in position −20 or position 1, and ending with an amino acid in position 603 (GHPO 1620); in SEQ ID NO: 14, beginning with an amino acid in any one of positions −40 to 5, preferably in position −40 or position 1, and ending with an amino acid in position 232 (GHPO 276); in SEQ ID NO: 16, beginning with an amino acid in any one of positions −37 to 5, preferably in position −37 or position 1, and ending with an amino acid in position 190 (GHPO 329); in SEQ ID NO: 18, beginning with an amino acid in any one of positions −30 to 5, preferably in position −30 or position 1, and ending with an amino acid in position 989 (GHPO 470); in SEQ ID NO:20, beginning with an amino acid in any one of positions −19 to 5, preferably in position −19 or position 1, and ending with an amino acid in position 295 (GHPO 574); in SEQ ID NO:22, beginning with an amino acid in any one of positions −21 to 5, preferably in position −21 or position 1, and ending with an amino acid in position 642 (GHPO 689); and in SEQ ID NO:24, beginning with an amino acid in any one of positions −23 to 5, preferably in position −23 or position 1, and ending with an amino acid in position 254 (GHPO 706); or (ii) a derivative of said polypeptide encoded by said polynucleotide.
 2. An isolated polynucleotide that encodes: (i) a polypeptide comprising an amino acid sequence that is homologous to an amino acid sequence selected from the group consisting of the amino acid sequences as shown: in SEQ ID NO:2, beginning with an amino acid in position −22 and ending with an amino acid in position 525 (GHPO 1012); in SEQ ID NO:4, beginning with an amino acid in position −25 and ending with an amino acid in position 451 (GHPO 1190); in SEQ ID NO:6, beginning with an amino acid in position −25 and ending with an amino acid in position 225 (GHPO 1398); in SEQ ID NO:8, beginning with an amino acid in position −19 and ending with an amino acid in position 310 (GHPO 1501); in SEQ ID NO: 10, beginning with an amino acid in position −40 and ending with an amino acid in position 182 (GHPO 1550); in SEQ ID NO: 12, beginning with an amino acid in position −20 and ending with an amino acid in position 603 (GHPO 1620); in SEQ ID NO: 14, beginning with an amino acid in position −40 and ending with an amino acid in position 232 (GHPO 276); in SEQ ID NO: 16, beginning with an amino acid in position −37 and ending with an amino acid in position 190 (GHPO 329); in SEQ ID NO: 18, beginning with an amino acid in position −30 and ending with an amino acid in position 989 (GHPO 470); in SEQ ID NO:20, beginning with an amino acid in position −19 and ending with an amino acid in position 295 (GHPO 574); in SEQ ID NO:22, beginning with an amino acid in position −21 and ending with an amino acid in position 642 (GHPO 689); and in SEQ ID NO:24, beginning with an amino acid in position −23 and ending with an amino acid in position 254 (GHPO 706); or (ii) a derivative of said polypeptide encoded by said polynucleotide.
 3. The isolated polynucleotide of claim 1, which encodes the mature form of: (i) a polypeptide comprising an amino acid sequence that is homologous to an amino acid sequence selected from the group consisting of the amino acid sequences as shown: in SEQ ID NO:2, beginning with an amino acid in position −22 and ending with an amino acid in position 525 (GHPO 1012); in SEQ ID NO:4, beginning with an amino acid in position −25 and ending with an amino acid in position 451 (GHPO 1190); in SEQ ID NO:6, beginning with an amino acid in position −25 and ending with an amino acid in position 225 (GHPO 1398); in SEQ ID NO:8, beginning with an amino acid in position −19 and ending with an amino acid in position 310 (GHPO 1501); in SEQ ID NO: 10, beginning with an amino acid in position −40 and ending with an amino acid in position 182 (GHPO 1550); in SEQ ID NO: 12, beginning with an amino acid in position −20 and ending with an amino acid in position 603 (GHPO 1620); in SEQ ID NO: 14, beginning with an amino acid in position −40 and ending with an amino acid in position 232 (GHPO 276); in SEQ ID NO: 16, beginning with an amino acid in position −37 and ending with an amino acid in position 190 (GHPO 329); in SEQ ID NO: 18, beginning with an amino acid in position −30 and ending with an amino acid in position 989 (GHPO 470); in SEQ ID NO:20, beginning with an amino acid in position −19 and ending with an amino acid in position 295 (GHPO 574); in SEQ ID NO:22, beginning with an amino acid in position −21 and ending with an amino acid in position 642 (GHPO 689); and in SEQ ID NO:24, beginning with an amino acid in position −23 and ending with an amino acid in position 254 (GHPO 706); or (ii) a derivative of said polypeptide.
 4. The isolated polynucleotide of claim 1, 2, or 3, wherein the polynucleotide is a DNA molecule.
 5. The isolated polynucleotide of claim 1, which is a DNA molecule that can be amplified and/or cloned by polymerase chain reaction from a Helicobacter genome, using either: A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:25 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:27, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:26 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:27, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:28 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:30, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:29 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:30, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:31 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:33, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:32 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:33, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:34 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:36, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:35 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:36, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:37 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:39, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:38 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:39, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:40 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:42, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:41 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:42, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:43 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:45, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:44 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:45, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:46 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:48, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:47 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:48, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:49 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:51, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:50 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:51, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:52 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:54, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:53 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:54, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:55 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:57, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:56 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:57, wherein N is a restriction site; A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:58 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO: 60, wherein N is a restriction site; or A 5′ oligonucleotide primer having a sequence as shown in SEQ ID NO:59 wherein N is a restriction site, and a 3′ oligonucleotide primer having a sequence in SEQ ID NO:60, wherein N is a restriction site.
 6. The isolated DNA molecule of claim 5, which can be amplified and/or cloned by the polymerase chain reaction from a Helicobacter pylori genome.
 7. The isolated polynucleotide of claim 1, which is a DNA molecule that encodes the mature form or a derivative of a polypeptide encoded by the DNA molecule of claim
 5. 8. The isolated polynucleotide of claim 1, which is a DNA molecule that encodes the mature form or a derivative of a polypeptide encoded by the DNA molecule of claim
 6. 9. A compound, in a substantially purified form, that is the mature form or a derivative of a polypeptide comprising an amino acid sequence that is homologous to an amino acid sequence of a polypeptide associated with the Helicobacter membrane, which is selected from the group consisting of the amino acid sequences as shown: in SEQ ID NO:2, beginning with an amino acid in position −22 and ending with an amino acid in position 525 (GHPO 1012); in SEQ ID NO:4, beginning with an amino acid in position −25 and ending with an amino acid in position 451 (GHPO 1190); in SEQ ID NO:6, beginning with an amino acid in position −25 and ending with an amino acid in position 225 (GHPO 1398); in SEQ ID NO:8, beginning with an amino acid in position −19 and ending with an amino acid in position 310 (GHPO 1501); in SEQ ID NO:10, beginning with an amino acid in position −40 and ending with an amino acid in position 182 (GHPO 1550); in SEQ ID NO: 12, beginning with an amino acid in position −20 and ending with an amino acid in position 603 (GHPO 1620); in SEQ ID NO: 14, beginning with an amino acid in position −40 and ending with an amino acid in position 232 (GHPO 276); in SEQ ID NO: 16, beginning with an amino acid in position −37 and ending with an amino acid in position 190 (GHPO 329); in SEQ ID NO: 18, beginning with an amino acid in position −30 and ending with an amino acid in position 989 (GHPO 470); in SEQ ID NO:20, beginning with an amino acid in position −19 and ending with an amino acid in position 295 (GHPO 574); in SEQ ID NO:22, beginning with an amino acid in position −21 and ending with an amino acid in position 642 (GHPO 689); and in SEQ ID NO:24, beginning with an amino acid in position −23 and ending with an amino acid in position 254 (GHPO 706); or (ii) a derivative of said polypeptide.
 10. The compound of claim 9, which is the mature form or a derivative of a polypeptide encoded by a DNA molecule of claim
 5. 11. The compound of claim 9, which is the mature form or a derivative of a polypeptide encoded by a DNA molecule of claim
 6. 12. A method of preventing or treating Helicobacter infection in a mammal, said method comprising administering to said mammal a prophylactically or therapeutically effective amount of a compound of claim 9, 10, or
 11. 13. The method of claim 12, further comprising administering an antibiotic, an antisecretory agent, a bismuth salt, or a combination thereof.
 14. The method of claim 13, wherein said antibiotic is selected from the group consisting of amoxicillin, clarithromycin, tetracycline, metronidizole, and erythromycin.
 15. The method of claim 13, wherein said bismuth salt is selected from the group consisting of bismuth subcitrate and bismuth subsalicylate.
 16. The method of claim 13, wherein said antisecretory agent is a proton pump inhibitor.
 17. The method of claim 16, wherein said proton pump inhibitor is selected from the group consisting of omeprazole, lansoprazole, and pantoprazole.
 18. The method of claim 13, wherein said antisecretory agent is an H₂-receptor antagonist.
 19. The method of claim 18, wherein said H₂-receptor antagonist is selected from the group consisting of ranitidine, cimetidine, famotidine, nizatidine, and roxatidine.
 20. The method of claim 13, wherein said antisecretory agent is a prostaglandin analog.
 21. The method of claim 20, wherein said prostaglandin analog is misoprostil or enprostil.
 22. The method of claim 12, which further comprises administering a prophylactically or therapeutically effective amount of a second Helicobacter polypeptide or a derivative thereof.
 23. The method of claim 22, wherein the second Helicobacter polypeptide is a Helicobacter urease, a subunit, or a derivative thereof.
 24. A composition comprising a compound of claim 9, 10, or 11, together with a physiologically acceptable diluent or carrier.
 25. The composition of claim 24, further comprising an adjuvant.
 26. The composition of claim 24, further comprising a second Helicobacter polypeptide or a derivative thereof.
 27. The composition of claim 26, wherein said second Helicobacter polypeptide is a Helicobacter urease, or a subunit or a derivative thereof.
 28. A method of preventing or treating Helicobacter infection in a mammal, said method comprising administering to said mammal a prophylactically or therapeutically effective amount of a polynucleotide of claim 1, 2, or
 3. 29. A method of preventing or treating Helicobacter infection in a mammal, said method comprising administering to said mammal a prophylactically or therapeutically effective amount of a polynucleotide of claim 5, 6, or
 7. 30. A method of preventing or treating Helicobacter infection in a mammal, said method comprising administering to said mammal a prophylactically or therapeutically effective amount of a polynucleotide of claim
 8. 31. A composition comprising a viral vector, in the genome of which is inserted a DNA molecule of claim 4, said DNA molecule being placed under conditions for expression in a mammalian cell and said viral vector being admixed with a physiologically acceptable diluent or carrier.
 32. The composition of claim 31, wherein said viral vector is a poxvirus.
 33. A composition that comprises a bacterial vector comprising a DNA molecule of claim 4, said DNA molecule being placed under conditions for expression and said bacterial vector being admixed with a physiologically acceptable diluent or carrier.
 34. The composition of claim 33, wherein said vector is selected from the group consisting of Shigella, Salmonella, Vibrio cholerae, Lactobacillus, Bacille bilié de Calmette-Guérin, and Streptococcus.
 35. A composition comprising a polynucleotide of claim 1, 2, or 3, together with a physiologically acceptable diluent or carrier.
 36. The composition of claim 35, wherein said polynucleotide is a DNA molecule that is inserted in a plasmid that is unable to replicate and to substantially integrate in a mammalian genome and is placed under conditions for expression in a mammalian cell.
 37. An expression cassette comprising a DNA molecule of claim 4, said DNA molecule being placed under conditions for expression in a procaryotic or eucaryotic cell.
 38. A process for producing a compound of claim 9, which comprises culturing a procaryotic or eucaryotic cell transformed or transfected with an expression cassette of claim 37, and recovering said compound from the cell culture.
 39. A method of preventing or treating Helicobacter infection in a mammal, said method comprising administering to said mammal a prophylactically or therapeutically effective amount of an antibody that binds to the compound of claim 9, 10, or
 11. 